I have a simple dataset from a within-subject design. Each participant provided a verbal description of 3 stimuli. The descriptions were coded so that they consist from objects each belonging to 1 out of 3 classes. I am interested in the proportion of these objects between the classes. I want to know if some classes were occurring more often than others and if some stimuli caused more often occurrence than others. So we have a following dataset of counts:
Participant stimulus class_a class_b class_c 1 stim1 8 0 3 1 stim2 7 9 5 1 stim3 1 2 4 2 stim1 3 4 6 ...
The simplest way to analyse this data would be a Repeated-Measures ANOVA. However, I would have to run them 3 times – once for each 'class' as a dependent variable. And it doesn't give me the relations between the classes.
Another solution might be MANOVA. But I am a bit confused whether it can be used when there is no between-subject independent variable (like an experimental condition). And how does it deal with the within-subject factor? I searched for R tutorials on that, but they either miss the repeated-measures component or deal with more than 1 experimental condition.
One more idea that I had was to build a mixed-effect model. Since mixed-effect models require a single output variable though, I would need to transform the data to the following format:
Participant stimulus class response 1 stim1 a 8 1 stim1 b 0 1 stim1 c 3 1 stim2 a 7 ...
My concern is, that any model I construct from such a dataset will treat both 'stimulus' and 'class' as Independent Variables / Predictors. In other words, they will lie on the right hand side of the formula: response ~ stimulus*class + (1|Participant).
At the same time, mixed-models seem to offer the highest power, require less assumptions, and provide the most flexibility. Can they be used for this multivariate analysis? Or can multiple models be constructed (one for each class) and somehow related to each other to show the differences across classes?
Your choice of analysis depends on what you want to focus on. It seems like you have a count dependent variable. In general they are better addressed with log-linear models (like poisson regression). However, if you want to analyse it like it was normally distributed than:
1) If your focus is the comparison of stimuli and you're having doubts about treating the class counts as one variable a MANOVA approach is probably superior.
a within-subject MANOVA would be adequate to test if the two stimuli result in a statistically significant difference in class counts. Given your first data frame the R syntax would be:
manova(cbind(class_a, class_b, class_c) ~ stimulus + Error(Participant))
This can be your omnibus test for differences in stimuli.
If you want to compare specific class counts across stimuli than you have to do post-hoc tests (treat the response count variable as the dependant and do, for example, paired t tests with the bonferroni correction).
2) If your focus is on comparing counts across classes and stimuli, than use mixed modelling. One thing that worries me here is the error structure due to the possible interdependence of the class counts.
- Solved – Repeated-measures linear mixed effect model
- Solved – Fleiss kappa in R giving strange results
- Solved – Mixed, repeated measure model specification and results interpretation using LMER in R
- Solved – When subject-based analysis is better than answer-based (and vice versa)
- Solved – What are advantages of MANOVA over a series of univariate analyses