There are there groups (colA, colB, colC) representing three different measurement methods. The question is whether there is a difference between the three methods. To decide the question we perform experiments and measure the same thing with each method. The output of each method is a label not a value on a numeric scale. In addition, we do not have a control group so we do not know which method's output is the right one in the experiment. I solved the question in the way below but I became uncertain whether it is a correct solution. I might have oversimplified the problem. Consequently, the chi-square approach used tests something else not what we want.

`Data: colA, colB, colC experiment1 label1, label1, label2 experiment2 label3, label1, label3 experiment3 label4, label4, label2 experiment4 label5, label4, label5 ... `

- My first approach, probably oversimplified:

As we do not know which label is correct and which not, we regard at first all the outputs of method A correct and compare them to the other two columns and count the same labels in the same experiments in the other columns, then we regard all the outputs of method B correct and then method C. So we get the label count below:

`I. colA regarded as correct colA(4) colB(2) colC(2) II. colB regarded as correct colA(2), colB(4), colC(0) III. colC regarded as correct colA(2), colB(0), colC(4) `

Then we compute a **chi-square test for each**, here II. and III. are the same. (There are more than 5 expected values in each column it is just the example to highlight the problem.) This way we get three p values for each test, from which we select the highest one to be on the safe side and decide whether the deviations between columns are likely to be caused by chance only.

- My second approach would probably be:

The data same data transformed in a different format:

` label1, label2, label3, label4, label5 methodA (earlier colA) 1, 0, 1, 1, 1, methodB (earlier colB) 2, 0, 0, 2, 0, methodC (earlier colC) 0, 2, 1, 0, 1, `

Here the expected values for each label would be greater than 5 with the real data. **However, we do not want to test whether the distribution among the labels are caused by chance only but whether there is a difference between the methodA, methodB and methodC.** Can Chi-Square be used here? What other tests would you propose to use to see whether there is a difference between the three methods?

**What would be your solution?**

**Contents**hide

#### Best Answer

I think your second approach would be right, if there is no other identifiable factor that could cause changes in the answers given by each method. (For example, if different methods were implemented by different researchers, or at different times, or using differently calibrated instruments.)

If the answers (i.e., the labels) could depend on something else other than method, then that "something else" could be introduced in the model and a higher dimensional contingency table be fitted using a log-linear model.

### Similar Posts:

- Solved – Replicating a plot from the ROCR Website
- Solved – How to explain hypothesis testing for teenagers in less than 10 minutes
- Solved – probablistic output for binary SVM classification
- Solved – How to plot visualization for multi-label k-Nearest Neighbor
- Solved – Calculate classifier accuracy from per label accuracy