I am not really confident in interpreting the ANOVA table of a GAM model. I understand how it can be used to compare models (see for instance this question), but I am interested in interpreting it for a single model.
library( mgcv ) set.seed( 1 ) RawData <- data.frame( y = rbinom( 1000, 1, 0.5 ), x1 = rnorm( 1000 ), x2 = as.factor( rbinom( 1000, 1, 0.5 ) ), x3 = rnorm( 1000 ), x4 = as.factor( rbinom( 1000, 1, 0.5 ) ) ) fit <- gam( y ~ s( x1 ) + x2 + s( x3, by = x2 ) + x4, data = RawData, family = nb( link = log ) ) anova( fit ) Family: Negative Binomial(251657.167) Link function: log Formula: y ~ s(x1) + x2 + s(x3, by = x2) + x4 Parametric Terms: df Chi.sq p-value x2 1 1.775 0.183 x4 1 0.796 0.372 Approximate significance of smooth terms: edf Ref.df Chi.sq p-value s(x1) 1.000 1.000 0.047 0.828 s(x3):x20 1.000 1.000 0.078 0.779 s(x3):x21 1.000 1.001 0.188 0.665
In particular, I'd be interested in the following:
- Can chi.sq values be given an "explained variance" interpretation (or similar), i.e. can they be used to measure variable importance, just like for a usual linear model?
- Can the chi.sq values of the smooth and parametric terms handled similarly?
- What to do with interactions? (As
x3in the example:
x3appears on two lines,
x2appears in those, and as a parametric term in addition.)
It's probably best to take a look at the mgcv help file ?anova.gam in R, but in answer to the specific questions:
The parametric chi.sq test statistics are just like their linear model equivalents, but the test statistic used for the smooths is different, and doesn't have an explained variance interpretation. I would not try to use them directly to measure variable importance. For details see http://opus.bath.ac.uk/32382/1/spv3.pdf.
No, as explained above.
I would fit the model with and without the interaction and compare (but probably by AIC).