Solved – Bayesian interpretation of linear regression with simultaneous L1 and L2 regularization (aka elastic net)

It's well known that linear regression with an $l^2$ penalty is equivalent to finding the MAP estimate given a Gaussian prior on the coefficients. Similarly, using an $l^1$ penalty is equivalent to using a Laplace distribution as the prior.

It's not uncommon to use some weighted combination of $l^1$ and $l^2$ regularization. Can we say that this is equivalent to some prior distribution over the coefficients (intuitively, it seems that it must be)? Can we give this distribution a nice analytic form (maybe a mixture of Gaussian and Laplacian)? If not, why not?

Ben's comment is likely sufficient, but I provide some more references one of which is from before the paper Ben referenced.

A Bayesian elastic net representation was proposed by Kyung et. al. in their Section 3.1. Although the prior for the regression coefficient $beta$ was correct, the authors incorrectly wrote down the mixture representation.

A corrected Bayesian model for the elastic net was recently proposed by Roy and Chakraborty (their Equation 6). The authors also go on to present an appropriate Gibbs sampler to sample from the posterior distribution, and show that the Gibbs sampler converges to the stationary distribution at a geometric rate. For this reason, these references might turn out to be useful, in addition to the Hans paper.

Similar Posts:

Rate this post

Leave a Comment