# Solved – How to measure consistency of measurement over time

Let's say I have multiple wine experts who rank set of wines over time.
To put it formally, I have n experts, m wines in the set and c measurements.
Measurement is a matrix with n rows and m columns, on position xy is value between 0 and 1 saying what wine expert x thinks about the wine y.

I'd like to measure, how good (consistent) their rating is.

As a simpler example, if I had only one expert and two measurements, I'd be able to calculate simple correlation between the different measurements and the correlation coefficient would be the result, how consistent the expert is with this set of wines.

Two questions:

1. What would be the correct way to measure this for multiple measurements from only one expert? (n = 1).
2. What would be the correct way to measure the consistency of more experts? (n > 1).
Contents

Although it is a long time after the question, I believe it is a clear question and common problem that needs to be properly addressed. Also, I found a similar question which I answered a few months ago. You can use the `coefficient of variation` (cv) or `coefficient of quartile variation` (cqv) with some considerations: $$CV = biggl(frac{sigma}{mu}biggr)times100,$$ (Albatineh, et al 2014)
$$CQV = biggl(frac{Q_3-Q_1}{Q_3+Q_1}biggr)times100$$ (Altunkaynak and Gamgam, 2018)

Since `cqv` and `cv` are unitless, they are useful for comparison of variables with different units. They are also measures of homogeneity/consistency (Bonett, 2006) (Altunkaynak and Gamgam, 2018). These measures can be efficiently calculated with 95% confidence intervals (`CI`) by the recently released cvcqv R package (on CRAN). I have also provided an example `wine.csv` file including three experts and five types of wine. A small chunk of data is:

``     expert    measurement Wine_1 Wine_2 Wine_3 Wine_4 Wine_5 1    expert_a  2019-01-01   0.70   0.60   0.30   0.10   0.80 2    expert_a  2019-01-02   0.60   0.70   0.40   0.20   0.80 3    expert_a  2019-01-03   0.65   0.65   0.35   0.15   0.80 44   expert_b  2019-01-04   0.90   0.10   0.90   0.10   0.90 45   expert_b  2019-01-05   0.20   0.12   0.21   0.31   0.21 46   expert_b  2019-01-06   0.80   0.56   0.79   0.89   0.69 115  expert_c  2019-02-04   0.43   0.24   0.15   0.68   0.92 116  expert_c  2019-02-05   0.42   0.32   0.16   0.69   0.91 117  expert_c  2019-02-06   0.41   0.31   0.15   0.70   0.90 ``

Because the example contains values with a non-normal distribution, `cqv` is a better indicator to find out the amount of variability (i.e., the higher the `cqv` the lower the consistency is). Therefore, the consistency of each expert is explored by `cqv` with 95% confidence intervals (refer to the vignette for the CI formulas). This figure shows the results:

The `cqv` (95% CI) of the experts' measurements for various wines over time is:

``   expert   wines  cqv_est cqv_lower cqv_upper  1 expert_a Wine_1    5.58     3.33       6.15  2 expert_a Wine_2    3.3      2.33       4.70  3 expert_a Wine_3    6.02     4.22       8.01  4 expert_a Wine_4   12.5      7.06      18.8   5 expert_a Wine_5    1.38     0.621      2.5   6 expert_b Wine_1   70.3     47.1       75.6   7 expert_b Wine_2   66.0     52.9       69.1   8 expert_b Wine_3   58       55.3       58.4   9 expert_b Wine_4   45.8     31.2       70.8  10 expert_b Wine_5   49.9     13.3       53.6  11 expert_c Wine_1   30.1     18.3       53.7  12 expert_c Wine_2   49.6     10.7       52.3  13 expert_c Wine_3   70.9     39.2       72.4  14 expert_c Wine_4   14.5      4.74      15.9  15 expert_c Wine_5   70.7      9.61      76.0  ``

As you see, only the expert_a shows consistent measurements for various wines over time; because large measurements with `cqv` or `cv` values (here higher than 10%) are generally considered non-reliable. Also, you can ignore wine type and calculate `cqv` for each expert, in which you can observe that expert_a shows significantly lower `cqv` than expert_b (i.e., non-overlapped CI):

``  expert   cqv_est cqv_lower cqv_upper 1 expert_a    35.6      32.7      54.8 2 expert_b    58.4      56.4      60.6 3 expert_c    58.5      47.5      63.0 ``

Rate this post