After doing some research I suppose the hard part is that, L2 regularized problem is often solved by gradient descent, while L1 regularized problem is often solved by coordinate descent.
But which algorithm should I use when L1 and L2 come up in the same loss function?
My specific problem is to factorize a known matrix $R$ into 3 component:
$ R = P^TAQ $
in which L1 regularization is applied to $A$ for sparsity, while L2 regularization is applied to $P$ and $Q$ for preventing overfitting.
Listing out some algorithms is enough. More detailed explanations are also appreciated.
Best Answer
Proximal gradient methods are natural here; the prox operator for $L_1$ is to move each entry a constant difference towards zero, while the one for $L_2$ is to multiply each entry by the relevant amount.
Coordinate descent methods will still work fine too. In this case, you'd solve coordinate-wise for entries of $A$ similar to if you were doing LASSO and for the entries of $P$ and $Q$ as if you were doing ridge regression.
Similar Posts:
- Solved – What optimization method does LIBLINEAR use for training L1 regularized logistic regression
- Solved – Why linear and logistic regression coefficients cannot be estimated using same method
- Solved – Why linear and logistic regression coefficients cannot be estimated using same method
- Solved – Computational complexity of the lasso (lars vs coordinate descent)
- Solved – Convex optimization: Is gradient descent faster if a regularizer is added