Solved – GAM model selection

I am fitting GAMs (in R using mgcv) to count data and am uncertain about how to select from among competing models. In my model specification I am using "ts" as my smoothing basis and a negative binomial distribution. As I understand it this essentially adds an extra penalization to the the smooth and thereby has the potential to help exclude some terms completely during the model fitting.

The modeling strategy I employed is a follows

I first fit a global model (i.e. with all my independent variables of interest) e.g., (N~s(x1)+s(x2)+s(x3)+s(x4)). When I find that the term x3 has an EDF that is practically equal to 0 and that x4 is not significant at the 0.05 level (but with EDF>1) I subsequently fitted a model by eliminating these two terms i.e. N~s(x1)+s(x2).

What does it mean when I compare the the two models and find that the nested model (now with all terms significant and non zero EDFs) has a lower deviance explained? Also is this a defensible approach to identify important predictor variables?

Any advice will be greatly appreciated.

If you are using an extra penalty on each term, you can just fit the model and you are done (from the point of view of selection). The point of these penalties is allow for shrinkage of the perfectly smooth functions in the spline basis expansion as well as the wiggly functions. The results of the model fit account for the selection/shrinkage. If you remove the insignificant terms and then refit, the inference results (say in summary() output) would not include the "effect" of the previous selection.

Assuming you have a well-chosen set of covariates and can fit the full model (a model with a smooth of each covariate plus any interactions you want) you should probably just work with the resulting fit of the shrunken full model.

If a term is using effectively 0 degrees of freedom it is having no effect on the fit/predictions at all. For the non-significant terms that have positive EDFs, by keeping them in you are effectively stating that these covariates have a small but non-zero effect. If you remove these terms as you suggest, you are saying explicitly that the effect is zero.

In short, don't fit the reduced model; work with the full model to which shrinkage was applied.

The deviance explained of the reduced model can be lower as it has fewer terms with which to explain variation in the response. It's a bit like the $R^2$ of a model increasing as you add covariates.

Similar Posts:

Rate this post

Leave a Comment