I am trying to determine which statistical test to use for the following:

I used a rubric to determine students' level of achievement on two different tasks (see table below). I want to know if their achievement is statistically different from task 1 to task 2. Do I use a chi square for this? I didn't think I could because the samples are paired. I can't use a paired t-test because the outcome is categorical. Any help?

**Contents**hide

#### Best Answer

Let $X$ be the level for Task 1 minus the level for Task 2 for each subject. Then $$X = (-1, -2, -2, -2, -2, -1, 1, 1),$$ ignoring subjects for which no difference was found.

**Sign test:** A one-sided sign test has P-value about 14%. (For a two-sided test, double this P-value.)

`pbinom(2, 8, .5) [1] 0.1445313 `

**Permutation test:** A permutation test can distinguish between differences -1 and -2 and so is more sensitive. This two-sided test has P-value about 12%. [The P-value is simulated, but two subsequent runs with different seeds also gave P-values about 12%.]

`set.seed(1117) x = c(-1, -2, -2, -2, -2, -1, 1, 1) t.obs = sum(x); t.obs [1] -8 t.prm = replicate(10^5, sum(sample(c(-1,1), 8, rep=T)*x)) mean(abs(t.prm) >= abs(t.obs)) # two-sided P-value [1] 0.11746 `

The distribution of totals of the sign-permuted differences is shown below. The heights of bars outside the vertical dotted lines show the P-value.

**Addendum:** For your manuscript, I suggest putting the data into a different format as in the last four rows below, to show how each student performed on each task and with $X_i$ the differences in performances.

`stu = 1:14 t1 = c(1,1,1,1,1, 1,1,1,1,1, 2,2,3,3) t2 = c(1,1,1,1,1, 2,3,3,3,3, 1,3,2,3) x = t1 - t2 rbind(stu, t1, t2, x) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] stu 1 2 3 4 5 6 7 8 9 10 11 12 13 14 t1 1 1 1 1 1 1 1 1 1 1 2 2 3 3 t2 1 1 1 1 1 2 3 3 3 3 1 3 2 3 x 0 0 0 0 0 -1 -2 -2 -2 -2 1 -1 1 0 `

If you are using the permutation test, argue that the $0$'s provide no information about different levels of performance on the two tasks. Also mention that this is a paired test and that it is two-sided because you had no reason to believe a particular one of the tasks would result in higher scores. You are testing $H_0: delta = 0$ and $H_a: delta ne 0,$ where $delta$ is the population difference between the two Tasks.

Perhaps, mention that data are far from normal so a t-test does not seem appropriate and that there are many ties so that the Wilcoxon test may not be appropriate.

The rationale for the permutation test is as follows: Under the null hypothesis we are assuming that T1 and T2 are not different. If that assumption is true it should make no difference whether signs of the $X_i$ are randomly switched. As the data stand, we observe $T = sum_i X_i = -8.$ Then the question is how $T$ changes under random sign-switching. The answer is that sign-switching can make $T$ either larger or smaller (as shown in the histogram). In fact, $|T|$ exceeds the observed value $|T| = 8$ for about 12% of the permuted values, so that $|T| = 8$ is not a sufficiently remarkable result to be called statistically significant.

Of course, whichever test you use in the manuscript, you need to check everything to make sure that the table in your Question matches the table below. If you are even marginally familiar with R, you can paste the few lines of code into the R 'Console' window and run it to verify the P-value.

If you use an 'exact' version of the Wilcoxon signed-rank test, you need to explain that it is not subject to the famous difficulties with ties that would make the traditional implementation of the test problematic here. Otherwise, some reviewer is going to waste time going back and forth about the validity of Wilcoxon.

Additional R code for the figure, in case you need it:

`hist(t.prm, prob=T, br=(-13:12)+.5, col="skyblue2", main="Permutation Dist'n of Totals") abline(v=c(-7,7), col="red", lwd=2, lty="dotted") `