# Solved – Can we express logistic loss minimization as a maximum likelihood problem

I have a simple question about the equivalence of loss minimization and likelihood maximization for logistic regression.

Say are given some data \$(x_i,y_i) text{ for } ~i = 1,ldots,N\$ where \$x_i in mathbb{R}^d\$ and \$y_i in {-1,1}\$.

In logistic regression, we fit the coefficients of a linear classification model \$w in mathbb{R}^d\$ by minimizing the logistic loss function. This results in the optimization problem:

\$\$ min_w sum_{i=1}^N log(1+exp(-y_iw^Tx_i)) \$\$

Let \$w^*\$ be the optimal set of coefficients obtained from this problem.

I am wondering: can we recover \$w^*\$ by solving a maximum likelihood problem? If so, what is the exact maximum likelihood problem that we would need to solve? And what would be the corresponding likelihood function for \$w\$?

Contents

#### Best Answer

It is equivalent to the maximum likelihood approach. The different appearance results from the different coding for \$y_i\$ (which is arbitrary).

Keeping in mind that \$y_i in {-1,1}\$, and denoting

\$\$Lambda(z) = [1+exp(-z)]^{-1}\$\$

we have that

\$\$min_w sum_{i=1}^N log[1+exp(-y_iw^Tx_i)] = max_w sum_{i=1}^N log Lambda(y_iw^Tx_i)\$\$

\$\$=max_w Big{sum_{y_i=1} log Lambda(w^Tx_i) + sum_{y_i=-1} log [1-Lambda(w^Tx_i)]Big} tag{1}\$\$

In the usual maximum likelihood approach we would have use the label, \$z_i in {0,1}\$, (\$0\$ instead of \$-1\$), and we would have assumed that

\$\$P(z_i =1 mid x_i) = Lambda(w^Tx_i)\$\$

With an independent sample, we would have arrived at the log-likelihood

\$\$=max_w Big{sum_{i=1}^N z_ilog Lambda(w^Tx_i) + sum_{i=1}^N (1-z_i)log [1-Lambda(w^Tx_i)]Big} tag{2}\$\$

Now note that

\$\$sum_{y_i=1} log Lambda(w^Tx_i) = sum_{i=1}^N z_ilog Lambda(w^Tx_i)\$\$

and

\$\$sum_{y_i=-1} log [1-Lambda(w^Tx_i)] = sum_{i=1}^N(1-z_i) log [1-Lambda(w^Tx_i)]\$\$

so \$(1)\$ and \$(2)\$ are equivalent.

Rate this post