Solved – Why does glmnet use coordinate descent for Ridge regression

If I understand it correctly, glmnet uses cyclical coordinate descent not only for lasso and elastics nets, but also for Ridge regression.

Why does it use this algorithm, which sometimes gives slightly inaccurate results, when there is in fact an easy closed form solution available?

Thank you very much in advance!

I think this is due to speed. Cyclical coordinate descent does not find the exact solution in finite time, but it is faster, not only for a grid of $lambda$'s but also for a single $lambda$.

Consider the task of solving ridge regression for a single $lambda$, with a data matrix of size $n times p$. I believe the optimal runtime for exact ridge regression is $O(n^2p)$ if $n < p$ and $O(np^2)$ if $n > p$. See Murphy, Machine Learning, section 7.5.2 for a reference.

With the cyclical coordinate descent algorithm, "a complete cycle through all $p$ variables costs $O(pN)$ operations" (p. 6, Friedman et al. 2010, One can then specify a number of cycles $c$ with $c ll min(n, p)$ to get a faster big-Oh runtime for a single $lambda$. For solving over many $lambda$'s, the glmnet method should yield further improvement using warm starts.

Similar Posts:

Rate this post

Leave a Comment