I have a simple question about the equivalence of loss minimization and likelihood maximization for logistic regression.
Say are given some data $(x_i,y_i) text{ for } ~i = 1,ldots,N$ where $x_i in mathbb{R}^d$ and $y_i in {-1,1}$.
In logistic regression, we fit the coefficients of a linear classification model $w in mathbb{R}^d$ by minimizing the logistic loss function. This results in the optimization problem:
$$ min_w sum_{i=1}^N log(1+exp(-y_iw^Tx_i)) $$
Let $w^*$ be the optimal set of coefficients obtained from this problem.
I am wondering: can we recover $w^*$ by solving a maximum likelihood problem? If so, what is the exact maximum likelihood problem that we would need to solve? And what would be the corresponding likelihood function for $w$?
Best Answer
It is equivalent to the maximum likelihood approach. The different appearance results from the different coding for $y_i$ (which is arbitrary).
Keeping in mind that $y_i in {-1,1}$, and denoting
$$Lambda(z) = [1+exp(-z)]^{-1}$$
we have that
$$min_w sum_{i=1}^N log[1+exp(-y_iw^Tx_i)] = max_w sum_{i=1}^N log Lambda(y_iw^Tx_i)$$
$$=max_w Big{sum_{y_i=1} log Lambda(w^Tx_i) + sum_{y_i=-1} log [1-Lambda(w^Tx_i)]Big} tag{1}$$
In the usual maximum likelihood approach we would have use the label, $z_i in {0,1}$, ($0$ instead of $-1$), and we would have assumed that
$$P(z_i =1 mid x_i) = Lambda(w^Tx_i)$$
With an independent sample, we would have arrived at the log-likelihood
$$=max_w Big{sum_{i=1}^N z_ilog Lambda(w^Tx_i) + sum_{i=1}^N (1-z_i)log [1-Lambda(w^Tx_i)]Big} tag{2}$$
Now note that
$$sum_{y_i=1} log Lambda(w^Tx_i) = sum_{i=1}^N z_ilog Lambda(w^Tx_i)$$
and
$$sum_{y_i=-1} log [1-Lambda(w^Tx_i)] = sum_{i=1}^N(1-z_i) log [1-Lambda(w^Tx_i)]$$
so $(1)$ and $(2)$ are equivalent.
Similar Posts:
- Solved – Can we express logistic loss minimization as a maximum likelihood problem
- Solved – Can we express logistic loss minimization as a maximum likelihood problem
- Solved – Loss function for generalized linear models
- Solved – LASSO for general likelihood
- Solved – Exponential distribution: Log-Likelihood and Maximum Likelihood estimator