I've read a lot of threads on stack exchange but haven't exactly found what I'm looking for. Everyone seems to have a slightly different problem/issue.
First, lets have a look at my data:
- 120 Users
- 80 Items per User
- Between factor with 4 levels
- Within factor with 4 levels
- Binary response variable
Now, usually one would perform a logistic regression. However, there are several issues with that:
- Categorical predictors have to be dummy coded?
- How do deal with the within-subject fator? -> lmm?
- How would you plot this data?
As I think (but I'm not sure? correct me if I'm wrong) that one assumption of binary logistic regression is the independence of errors which is – again not sure – violated with within subject factor I'm trying to perform a linear-mixed effects model for my data. Now here begins the actual problem:
First: I know, that I want to model the between-subjects and within-subject factors as fixed effects.
However: which random effects should I implement?
- Random intercept of each subject as it can be assumed that they differ in they're apriori knowledge of the items (it's a performance test).
- Random intercept of the within-subject factor – as this is the reason for performing lmm in the first place?
- My data is actually not nested, so there is no sense in creating a random effect like all the examples "school", "county" and so on…
- other suggestions?
Okay my suggestion is to assume random intercepts of subjects (the first one):
lmm3 <- glmer(y ~ between * within + (1|user), data, family = binomial(link = "logit"))
But, first I would have to calculate the ICC 1 and ICC2 to support the use uf lmm.
for the ICC 1 I use the nullmodel:
lmm0 <- glmer(y ~ (1|user), data, family = binomial(link = "logit")) tau2<-lme4::VarCorr(lmm0)[] icc1 <- tau2/(tau2+pi^2/3)
No, again two questions arise:
- How do I calculate ICC2 for the logistic model? I know that there is a function for linear mixed models, but this is not the case here.
- However, my ICC1 is only 0.03760069 so it seems that this above model doesn't make a lot of sense. What kind of model should I try then?
I thank you a lot for your inputs. You need more specific information I would be willing to prepare some data for you. I know that this is a rather theoretical issue so I'm looking forward to a discussion.
Since you have 120 users and 80 items per user, your model would have to treat user and item as random grouping factors.
Are the 80 items the same for each of your users? If yes, the random grouping factors user and item will be fully crossed, in which case the glmer syntax would include terms like
(1|user) + (1|item)
See https://nlp.stanford.edu/manning/courses/ling289/GLMM.pdf for more ideas on how to proceed.
- Solved – Is measuring click-through probability required when the experiment’s unit of diversion is a user
- Solved – Out of sample predictions for logistic regression models in R
- Solved – Mean Percentage Ranking in implicit feedback ALS
- Solved – Collaborative Filtering: How to update user vectors online
- Solved – Support Vector Machines and Recommender Algorithms