I have 2560 paired observations from an experiment in which participants provided two ratings for a set of objects, at two different points in time. Half of the objects in the set had the value of an attribute A changed between the two time points, half did not. Of the objects that were changed in each participant's set, half went from A' to A'' and half from A'' to A'. (i.e. all participants experienced both orders). My main hypothesis is that changing this attribute from A' to A'' leads on average to a higher rating, and this is indeed supported by the data. I am also interested in determining whether the magnitude and perhaps direction of this effect depends on the A' rating.

For the purposes of this question, I am considering only those instances where A was changed (1280 pairs of obs). The following GLMM

(A'' rating – A' rating) = participant + order + A' rating

where A' rating is a covariate and participant and order are categorical variables, leads to the conclusion that there is a significant positive correlation between A' rating and effect of changing to A'' and that this correlation is <1, such that objects with a low A' rating have their rating increased by changing to A'' but that objects with a high A' rating actually get rated lower when changed to A''.

I want to test whether this is simply due to regression to the mean. To this end, I have followed Kelly and Price in using Pitman's test of equality of variances for paired samples and would appreciate some feedback on whether I've done the right thing.

This is what I did, following the suggestion of a colleague:

1) calculated the SD of A'' ratings $(SD_1)$ and the SD of A' ratings $(SD_2)$

2) regressed A'' rating on A' rating and recorded the correlation $r$.

3) calculated T as $T=frac{sqrt{(n-2)} [(SD_1/SD_2)-(SD_2/SD_1)]}{2 sqrt{(1-r^2)}}$

The 2-tailed p value of T (Student's t dist with 1280-2 DF) is 0.07, i.e. at alpha=0.05 there is no significant difference between the variances for the two sets of ratings and thus no effect of A' rating on rating difference beyond regression to the mean. (We can argue about 2-t vs. 1-t p values later).

I now plan to adjust my difference scores to account for this and re-do the GLMM outlined above, as outlined by Kelly & Price.

If you've got this far through the detail, then firstly well done, and secondly, can you tell me if 1) my colleague's suggestion was sensible and 2) if it was, have I implemented it correctly? I have a couple of concerns/apparent grey areas but I'd be interested to hear what others have to say first.

Thank you.

**Contents**hide

#### Best Answer

As far as I know, the Pitman test is formulated as :

$$F=frac{SD_2}{SD_1} ~with~ SD_2 > SD_1$$

$$T=frac{(F-1)sqrt{n-2}}{2sqrt{F(1-r^2)}} $$ with $r$ the correlation between the scores in sample 1 and sample 2. This is not equivalent to the formula you use and mentioned in the paper. I'm not positive about my formula either, I got it from a course somewhere (alas no reference…)

Apart from that, it might be interesting to take a look at an alternative approach to dealing with regression to the mean. I found the tutorial paper of Barnett et al on regression to the mean very enlightening.

Now let's get back a moment to the 2-sided versus 1-sided p-values. Regardless of the formula you use, the sign of T is only dependent on the order of the SD's. (In fact, how I know the pitman test, T is always positive.) Hence, the underlying distribution is -as far as I'm concerned- not the T distribution but half the T distribution, meaning you have to put the cutoff at $T_{0.975, df}$, but the related p-value is originating from one tail only. This is equivalent the standard F test for comparing variances.

### Similar Posts:

- Solved – Pitman’s test of equality of variance and testing for regression to the mean: am I doing the right thing
- Solved – Pitman’s test of equality of variance and testing for regression to the mean: am I doing the right thing
- Solved – the relevance of standard deviation
- Solved – Combining ratings from multiple raters of different accuracy
- Solved – How to perform inter-rater reliability with multiple raters, different raters per participant, and possible changes over time