I have a quantitative variable (Dim: which is actually factor scores) and 2 categorical variables (Category + Region) in my dataset. I am interested in knowing if the variance explained by each categorical variable (and their interaction) is significant. Additionally I also want to know how much overall variance is explained by the model (i.e. the R squared value). After consulting materials about this topic I came to know that following 2 commands in R can fulfill my requirements.
my_lm <- lm(Dim4 ~ Category*Region, data=df) summary(my_lm) Anova(my_lm, type = "III")
I'm also interested in reporting whether group comparisons within each category are significantly different from each other. For this purpose, TukeyHSD has been mentioned in the reference material.
TukeyHSD(aov(my_lm), ordered=TRUE)
This command outputs the whole range of group comparisons for example:
$Category diff lwr upr p adj Tweets-Conversations 0.6843803 -0.279052730 1.647813 0.2961601 Comments-Conversations 1.6552203 0.665253423 2.645187 0.0000540 $Region diff lwr upr p adj US-PK 1.146865 0.7036846 1.590046 5e-07 $`Category:Region` diff lwr upr p adj Conversations:US-Tweets:PK 0.04138269 -1.46974284 1.552508 1.0000000 Conversations:PK-Tweets:PK 0.30818926 -1.31628692 1.932665 0.9998612 Comments:PK-Tweets:PK 1.20670619 -0.35032424 2.763737 0.2931398 Tweets:US-Tweets:PK 1.78097509 0.24027735 3.321673 0.0097448
While I am interested in all comparisons of Category and Region individually, in the interaction subgroups I am only interested in certain comparisons, e.g. Tweets:US-Tweets-PK, Conversations:US-Conversations-PK.
I have 2 questions regarding reporting the results of these tests. Firstly,How should I report the results of linear regression model (e.g. in case of Anova there would be F-statistic, p value and degrees of freedom like this One-way ANOVA: F(9, 3085) = 70.23, p < .001;). But I am looking at my data in terms of linear regression and want to report r squared + significance of each variable and their interaction as well. Does that involve reporting degrees of freedom and F statistic?
Secondly, as mentioned above I am only interested in certain group comparisons when it comes to interaction effects, can I just report the selected group comparisons? Another relevant question is if I'm not interested in all comparisons, can I limit the output of post hoc test to my desired group comparisons only?
Best Answer
You could take a look at the emmeans (estimated marginal means) package in R. It provides a lot of flexibility for these kinds of purposes. Since it appears you do have an interaction of consequence, I recommend against doing main-effects comparisons of either factor, averaged over the other. Instead, get a good display of how the interaction plays out, e.g. an interaction plot:
require("emmeans") emmip(my_lm, Category ~ Region)
and display the EMMs and comparisons of one factor's levels separately by the other:
emm = emmeans(my_lm, ~ Category * Region) emm pairs(emm, by = "Category") pairs(emm, by = "Region")
If you do not want all pairwise comparisons even with a by
variable, there are several other built-in contrast types, or you can specify your own. For example, to compare consecutive levels of Category
for each Region
, do
contrast(emm, "consec", by = "Region")
There are a lot of examples and discussion in the several vignettes in the package. Start with vignette("basics")
Similar Posts:
- Solved – Reporting linear regression with Post-hoc comparisons
- Solved – Reporting linear regression with Post-hoc comparisons
- Solved – Should anova(model, test=”Chisq”) not produce a test statistic as well as a P value
- Solved – Interpretation of interaction term in R lm(): .L .Q
- Solved – Difference between Within-Subject Contrasts and Pairwise Comparisons in a Repeated Measures ANOVA