I'm working on a predictive cost model where the patient's age (an integer quantity measured in years) is one of the predictor variables. A strong nonlinear relationship between age and risk of a hospital stay is evident:
I'm considering a penalized regression smoothing spline for patient age. According to The Elements of Statistical Learning (Hastie et al, 2009, p.151), the optimal knot placement is one knot per unique value of member age.
Given that I'm retaining age as an integer, is the penalized smoothing spline equivalent to running a ridge regression or lasso with 101 distinct age indicator variables, one per age value found in the dataset (minus one for reference)? Over parametrization is then avoided as the coefficients on each age indicator are shrunk towards zero.
Best Answer
Great question. I believe that the answer to the question you ask – "is the penalized smoothing spline equivalent to running a ridge regression or lasso" – is yes. There are a number of sources out there that can provide commentary & perspective. One place that you may want to start with is this PDF link. As is noted in the notes:
"Fitting a smoothing spline model amounts to performing a form of ridge regression in a basis for natural splines."
If you are looking for some general reading, you might enjoy checking out this excellent paper on Penalized Regressions: The Bridge Versus the Lasso. This might help answer the question of whether the penalized smoothing spline is exactly equivalent – though it provides more general perspective. I do find it interesting as they compared different techniques to each other, specifically a new bridge regression model with the LASSO, as well as Ridge Regression.
Another more tactical place to check might be the package notes for the smooth.spline package in R. Note that they hint at the relationship here, by observing that: "with these definitions, where the B-spline basis representation can be stated as f = X c (i.e., c is the vector of spline coefficients), the penalized log likelihood is $L = (y – f)^T W (y – f) + lambda c^T Sigma c$, and hence $c$ is the solution of the (ridge regression) $(X^T W X + lambda Sigma) c = X^T W y$."
Similar Posts:
- Solved – Selection of k knots in regression smoothing spline equivalent to k categorical variables
- Solved – Difference between smoothing spline and penalised spline
- Solved – DOF of Natural Cubic Spline
- Solved – Relation between knots and degree in regression spline context
- Solved – Why Ridge regularization has the grouping effect