I have some confusion related to Kruskal wallis test. I have an example lets say

`X=[2 2 35 10 9 8 11 12]; Y=[1 1 1 2 2 2 2 2]; `

Y is the group variable

Now when I ran the kruskalwallis test

`p = kruskalwallis(X,Y,'off') `

I got p values of around 0.4. I was assuming the Kruskal wallis test takes the median. So it should have been robust when I added an outlier with value 35 in the third position. Why isn't it robust to that. Is it because I have very few samples. Can anyone explain?

**Contents**hide

#### Best Answer

If Y is meant to be a grouping variable, the p-value in R is around 0.45

`> kruskal.test(x~y) Kruskal-Wallis rank sum test data: x by y Kruskal-Wallis chi-squared = 0.5622, df = 1, p-value = 0.4534 `

But it makes no difference whether that 35 is set to 13 or 35 or 1300 – the p-value is exactly the same. It is clearly robust to outliers.

With continuity correction, the p-value is somewhat higher.

Edit:

Here's an illustration of just how the Kruskal-Wallis p-value responds as you move the third observation around – that is, this is an empirical influence curve for the p-value as `x[3]`

is moved (takes the various values of delta).

We see that the Kruskal-Wallis is highly insensitive to all but a small range of values for `x[3]`

(it is constant to the left of $[1,2]$ and constant to the right of it). It's *really* insensitive.

The grey line is the p-value with x[3] omitted. As you see, no value for `x[3]`

will allow the Kruskal-Wallis to attain that p-value, though making `x[3]=2`

comes closest.

I was assuming the Kruskal wallis test takes the median.

It's a rank-based ANOVA. It doesn't actually 'use' the median for anything.

The measure of location-shift that corresponds to the Wilcoxon-Mann-Whitney (and hence to the Kruskal-Wallis) is the median of pairwise differences between the samples.

`> median(outer(x[y==1],x[y==2],"-")) [1] -7 `

Compare:

`> wilcox.test(x~y,conf.int=TRUE) Wilcoxon rank sum test with continuity correction data: x by y W = 5, p-value = 0.5486 alternative hypothesis: true location shift is not equal to 0 95 percent confidence interval: -10 5 sample estimates: difference in location -6.999992 #<------------------------------- `

(I'm not sure why it doesn't have better accuracy there)

If you change the 35 to 13 or 1300, you get the same estimate of shift.

If you add a whole new observation – if your original data in the first group was just (2, 2), then adding an additional observation changes the p-value. (This would be the case even if the median was the estimate of location shift.)

### Similar Posts:

- Solved – Kruskal-Wallis vs Jonckheere-Terpstra Test
- Solved – Mann-Whitney U test or Kruskal Wallis test for comparing median of two groups
- Solved – Kruskal–Wallis non-parametric alternatives for groups with different shaped distributions
- Solved – Kruskal–Wallis non-parametric alternatives for groups with different shaped distributions
- Solved – Comparing unbalanced groups with ANOVA/Kruskal-Wallis when one group has only 1 observation