After doing some research I suppose the hard part is that, L2 regularized problem is often solved by gradient descent, while L1 regularized problem is often solved by coordinate descent.

But which algorithm should I use when L1 and L2 come up in the same loss function?

My specific problem is to factorize a known matrix $R$ into 3 component:

$ R = P^TAQ $

in which L1 regularization is applied to $A$ for sparsity, while L2 regularization is applied to $P$ and $Q$ for preventing overfitting.

Listing out some algorithms is enough. More detailed explanations are also appreciated.

**Contents**hide

#### Best Answer

Proximal gradient methods are natural here; the prox operator for $L_1$ is to move each entry a constant difference towards zero, while the one for $L_2$ is to multiply each entry by the relevant amount.

Coordinate descent methods will still work fine too. In this case, you'd solve coordinate-wise for entries of $A$ similar to if you were doing LASSO and for the entries of $P$ and $Q$ as if you were doing ridge regression.

### Similar Posts:

- Solved – What optimization method does LIBLINEAR use for training L1 regularized logistic regression
- Solved – Why linear and logistic regression coefficients cannot be estimated using same method
- Solved – Why linear and logistic regression coefficients cannot be estimated using same method
- Solved – Computational complexity of the lasso (lars vs coordinate descent)
- Solved – Convex optimization: Is gradient descent faster if a regularizer is added