Solved – Optimization with both L1 and L2 regularization

After doing some research I suppose the hard part is that, L2 regularized problem is often solved by gradient descent, while L1 regularized problem is often solved by coordinate descent.

But which algorithm should I use when L1 and L2 come up in the same loss function?

My specific problem is to factorize a known matrix $R$ into 3 component:

$ R = P^TAQ $

in which L1 regularization is applied to $A$ for sparsity, while L2 regularization is applied to $P$ and $Q$ for preventing overfitting.

Listing out some algorithms is enough. More detailed explanations are also appreciated.

Proximal gradient methods are natural here; the prox operator for $L_1$ is to move each entry a constant difference towards zero, while the one for $L_2$ is to multiply each entry by the relevant amount.

Coordinate descent methods will still work fine too. In this case, you'd solve coordinate-wise for entries of $A$ similar to if you were doing LASSO and for the entries of $P$ and $Q$ as if you were doing ridge regression.

Similar Posts:

Rate this post

Leave a Comment