Let's say I have multiple wine experts who rank set of wines over time.

To put it formally, I have n experts, m wines in the set and c measurements.

Measurement is a matrix with n rows and m columns, on position xy is value between 0 and 1 saying what wine expert x thinks about the wine y.

I'd like to measure, how good (consistent) their rating is.

As a simpler example, if I had only one expert and two measurements, I'd be able to calculate simple correlation between the different measurements and the correlation coefficient would be the result, how consistent the expert is with this set of wines.

Two questions:

- What would be the correct way to measure this for multiple measurements from only one expert? (n = 1).
- What would be the correct way to measure the consistency of more experts? (n > 1).

**Contents**hide

#### Best Answer

Although it is a long time after the question, I believe it is a clear question and common problem that needs to be properly addressed. Also, I found a **similar question** which I answered a few months ago. You can use the `coefficient of variation`

(*cv*) or `coefficient of quartile variation`

(*cqv*) with some considerations: $$CV = biggl(frac{sigma}{mu}biggr)times100,$$ (Albatineh, et al 2014)

$$CQV = biggl(frac{Q_3-Q_1}{Q_3+Q_1}biggr)times100$$ (Altunkaynak and Gamgam, 2018)

Since `cqv`

and `cv`

are unitless, they are useful for comparison of variables with different units. They are also measures of **homogeneity/consistency** (Bonett, 2006) (Altunkaynak and Gamgam, 2018). These measures can be efficiently calculated with *95% confidence intervals* (`CI`

) by the recently released cvcqv R package (on CRAN). I have also provided an example `wine.csv`

file including **three experts** and **five types** of wine. A small chunk of data is:

` expert measurement Wine_1 Wine_2 Wine_3 Wine_4 Wine_5 1 expert_a 2019-01-01 0.70 0.60 0.30 0.10 0.80 2 expert_a 2019-01-02 0.60 0.70 0.40 0.20 0.80 3 expert_a 2019-01-03 0.65 0.65 0.35 0.15 0.80 44 expert_b 2019-01-04 0.90 0.10 0.90 0.10 0.90 45 expert_b 2019-01-05 0.20 0.12 0.21 0.31 0.21 46 expert_b 2019-01-06 0.80 0.56 0.79 0.89 0.69 115 expert_c 2019-02-04 0.43 0.24 0.15 0.68 0.92 116 expert_c 2019-02-05 0.42 0.32 0.16 0.69 0.91 117 expert_c 2019-02-06 0.41 0.31 0.15 0.70 0.90 `

Because the example contains values with a **non-normal distribution**, `cqv`

is a better indicator to find out the amount of variability (i.e., the higher the `cqv`

the lower the consistency is). Therefore, the consistency of each expert is explored by `cqv`

with 95% confidence intervals (refer to the vignette for the CI formulas). This figure shows the results:

The `cqv`

(95% CI) of the experts' measurements for various wines over time is:

` expert wines cqv_est cqv_lower cqv_upper 1 expert_a Wine_1 5.58 3.33 6.15 2 expert_a Wine_2 3.3 2.33 4.70 3 expert_a Wine_3 6.02 4.22 8.01 4 expert_a Wine_4 12.5 7.06 18.8 5 expert_a Wine_5 1.38 0.621 2.5 6 expert_b Wine_1 70.3 47.1 75.6 7 expert_b Wine_2 66.0 52.9 69.1 8 expert_b Wine_3 58 55.3 58.4 9 expert_b Wine_4 45.8 31.2 70.8 10 expert_b Wine_5 49.9 13.3 53.6 11 expert_c Wine_1 30.1 18.3 53.7 12 expert_c Wine_2 49.6 10.7 52.3 13 expert_c Wine_3 70.9 39.2 72.4 14 expert_c Wine_4 14.5 4.74 15.9 15 expert_c Wine_5 70.7 9.61 76.0 `

As you see, only the **expert_a** shows consistent measurements for various wines over time; because large measurements with `cqv`

or `cv`

values (here higher than 10%) are generally considered non-reliable. Also, you can ignore wine type and calculate `cqv`

for each expert, in which you can observe that **expert_a** shows significantly lower `cqv`

than **expert_b** (i.e., non-overlapped *CI*):

` expert cqv_est cqv_lower cqv_upper 1 expert_a 35.6 32.7 54.8 2 expert_b 58.4 56.4 60.6 3 expert_c 58.5 47.5 63.0 `