I am currently looking at doing an ANOVA to check for evidence of differences between the groups mean, part of what I am doing I will be reporting the CV (sd/mean) for quantifying the amount of variation within each group, that started me thinking about quantifying the variation between groups, am I able to take the group mean across all three groups and use the group standard deviation from the ANOVA to calculate and use a between groups coefficient of variation???
Best Answer
When dealing with a linear model (as when conducting anova), the coefficient of variation for the model can be calculated as the root mean square error divided by the grand mean (and then multiplied by 100%).
A similar procedure could also be conducted on a single group of values.
But note that when observed values are both positive and negative, dividing by the mean may be of limited utility. In these cases, you might consider other measures of accuracy, like root mean square error.
The following uses R code, but I think it's all easy enough to follow.
Source, with the caveat that I am the author of this function: https://rdrr.io/cran/rcompanion/man/accuracy.html
Make some toy data, construct linear model, and conduct anova
Treatment = rep(c("A", "B", "C"), each = 5) Value = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15) Treatment ### "A" "A" "A" "A" "A" "B" "B" "B" "B" "B" "C" "C" "C" "C" "C" Value ### 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 model = lm(Value ~ Treatment) anova(model) ### Analysis of Variance Table ### ### Df Sum Sq Mean Sq F value Pr(>F) ### Treatment 2 250 125.0 50 1.513e-06 *** ### Residuals 12 30 2.5
The following uses the predicted values from the model (predy), the observed values (actual), and uses them to calculate the mean square error (mse), root mean square error (rmse), root mean square error divided by the grand mean (nrmse), and then this multiplied by 100% (cv_prcnt).
actual = Value predy = predict(model) mse = mean((actual - predy)^2) rmse = sqrt(mse) nrmse = rmse/mean(actual) cv_prcnt = nrmse * 100 cv_prcnt ### 17.68
Using this procedure on a single group will yield the same value as the population standard deviation divided by the mean. But note that this will be a different result than if the sample standard deviation is used.
A = c(1,2,3,4,5) actual = A predy = mean(A) mse = mean((actual - predy)^2) rmse = sqrt(mse) nrmse_mean = rmse/mean(actual) cv_prcnt = nrmse_mean * 100 cv_prcnt ### 47.14
This is the same result as dividing the population standard deviation divided by the mean.
population_sd = sqrt(sum((A - mean(A))^2)/(length(A))) population_sd / mean(A) ### 0.4714
Software will often default to using the sample standard deviation. This will return a different result than the previous procedure.
sd(A)/mean(A) ### 0.5270
For R users, there is a function that will calculate CV for several types of models. (With the caveat that I am the author of this function.)
if(!require(rcompanion)){install.packages("rcompanion")} library(rcompanion) accuracy(list(model)) ### $Fit.criteria ### NRMSE.mean CV.prcnt 0.177 17.7
Similar Posts:
- Solved – n alternative to R squared to compare goodness of fits of different datasets? Slope makes them incomparable
- Solved – How to find in-sample sum-of-squared errors and $R^2$ after glm
- Solved – Confidence interval for RMSE
- Solved – Confidence interval for RMSE
- Solved – the difference between the root mean square error and the standard error of estimate