Let's say I have multiple wine experts who rank set of wines over time.
To put it formally, I have n experts, m wines in the set and c measurements.
Measurement is a matrix with n rows and m columns, on position xy is value between 0 and 1 saying what wine expert x thinks about the wine y.
I'd like to measure, how good (consistent) their rating is.
As a simpler example, if I had only one expert and two measurements, I'd be able to calculate simple correlation between the different measurements and the correlation coefficient would be the result, how consistent the expert is with this set of wines.
Two questions:
- What would be the correct way to measure this for multiple measurements from only one expert? (n = 1).
- What would be the correct way to measure the consistency of more experts? (n > 1).
Best Answer
Although it is a long time after the question, I believe it is a clear question and common problem that needs to be properly addressed. Also, I found a similar question which I answered a few months ago. You can use the coefficient of variation
(cv) or coefficient of quartile variation
(cqv) with some considerations: $$CV = biggl(frac{sigma}{mu}biggr)times100,$$ (Albatineh, et al 2014)
$$CQV = biggl(frac{Q_3-Q_1}{Q_3+Q_1}biggr)times100$$ (Altunkaynak and Gamgam, 2018)
Since cqv
and cv
are unitless, they are useful for comparison of variables with different units. They are also measures of homogeneity/consistency (Bonett, 2006) (Altunkaynak and Gamgam, 2018). These measures can be efficiently calculated with 95% confidence intervals (CI
) by the recently released cvcqv R package (on CRAN). I have also provided an example wine.csv
file including three experts and five types of wine. A small chunk of data is:
expert measurement Wine_1 Wine_2 Wine_3 Wine_4 Wine_5 1 expert_a 2019-01-01 0.70 0.60 0.30 0.10 0.80 2 expert_a 2019-01-02 0.60 0.70 0.40 0.20 0.80 3 expert_a 2019-01-03 0.65 0.65 0.35 0.15 0.80 44 expert_b 2019-01-04 0.90 0.10 0.90 0.10 0.90 45 expert_b 2019-01-05 0.20 0.12 0.21 0.31 0.21 46 expert_b 2019-01-06 0.80 0.56 0.79 0.89 0.69 115 expert_c 2019-02-04 0.43 0.24 0.15 0.68 0.92 116 expert_c 2019-02-05 0.42 0.32 0.16 0.69 0.91 117 expert_c 2019-02-06 0.41 0.31 0.15 0.70 0.90
Because the example contains values with a non-normal distribution, cqv
is a better indicator to find out the amount of variability (i.e., the higher the cqv
the lower the consistency is). Therefore, the consistency of each expert is explored by cqv
with 95% confidence intervals (refer to the vignette for the CI formulas). This figure shows the results:
The cqv
(95% CI) of the experts' measurements for various wines over time is:
expert wines cqv_est cqv_lower cqv_upper 1 expert_a Wine_1 5.58 3.33 6.15 2 expert_a Wine_2 3.3 2.33 4.70 3 expert_a Wine_3 6.02 4.22 8.01 4 expert_a Wine_4 12.5 7.06 18.8 5 expert_a Wine_5 1.38 0.621 2.5 6 expert_b Wine_1 70.3 47.1 75.6 7 expert_b Wine_2 66.0 52.9 69.1 8 expert_b Wine_3 58 55.3 58.4 9 expert_b Wine_4 45.8 31.2 70.8 10 expert_b Wine_5 49.9 13.3 53.6 11 expert_c Wine_1 30.1 18.3 53.7 12 expert_c Wine_2 49.6 10.7 52.3 13 expert_c Wine_3 70.9 39.2 72.4 14 expert_c Wine_4 14.5 4.74 15.9 15 expert_c Wine_5 70.7 9.61 76.0
As you see, only the expert_a shows consistent measurements for various wines over time; because large measurements with cqv
or cv
values (here higher than 10%) are generally considered non-reliable. Also, you can ignore wine type and calculate cqv
for each expert, in which you can observe that expert_a shows significantly lower cqv
than expert_b (i.e., non-overlapped CI):
expert cqv_est cqv_lower cqv_upper 1 expert_a 35.6 32.7 54.8 2 expert_b 58.4 56.4 60.6 3 expert_c 58.5 47.5 63.0