# Solved – MLE for Linear Regression, student-t distributed error

Say I have a \$n\$ pairs \$(X_i, Y_i)\$ and I have the relationship \$\$Y_i = beta_0 + beta_1X_i +epsilon_i\$\$ where the error terms \$epsilon_i\$ are iid from a student t-distribution with constant (known) degrees of freedom, (say \$kinmathbb{N}\$ degrees of freedom, where \$k > 2\$). If I wanted to compute the the MLE for \$beta_0, beta_1\$, how do I relate the given distribution to the parameters of interest?

Sorry if this is trivial, but I searched around and haven't really seen an example like this.

Contents

The MLE is obtained by maximising the log-likelihood function, so the first thing you will want to do is have a look at this function. Using the density function for the Student T-distribution with known degrees-of-freedom $$k in mathbb{N}$$, you can write the log-likelihood as:

$$ell_mathbf{x,y} (beta_0, beta_1) = – sum_{i=1}^n ln Big( k + (y_i – beta_0 – beta_1 x_i)^2 Big).$$

The residuals $$r_i = y_i – hat{beta}_0 – hat{beta}_1 x_i$$ under the MLE minimise $$sum_{i=1}^n ln ( k + r_i^2 )$$. As $$k rightarrow infty$$ we have $$ln ( k + r_i^2 ) = ln(k) + ln(1+r_i^2/k) approx ln(k) + r_i^2/k$$ so that the residuals minimise $$sum_{i=1}^n r_i^2$$ in the limit, which is the standard OLS solutions for normally distributed errors. Use of the T-distribution effectively dampens the effect of large residuals, through the above logarithmic transformation, and so the MLE is more tolerant of having some large residuals than in the normal case.

Finding the MLE: The MLE can be obtained via numerical maximisation of the log-likelihood using ordinary calculus techniques. The gradient of the log-likelihood is given by the partial derivatives:

begin{equation} begin{aligned} frac{partial ell_mathbf{x,y}}{partial beta_0}(beta_0, beta_1) &= sum_{i=1}^n frac{2 (y_i – beta_0 – beta_1 x_i)}{k+(y_i – beta_0 – beta_1 x_i)^2}, \[6pt] frac{partial ell_mathbf{x,y}}{partial beta_1}(beta_0, beta_1) &= sum_{i=1}^n frac{2 x_i (y_i – beta_0 – beta_1 x_i)}{k+(y_i – beta_0 – beta_1 x_i)^2}. \[6pt] end{aligned} end{equation}

This leads to the score equations:

begin{equation} begin{aligned} 0 &= sum_{i=1}^n frac{2 (y_i – hat{beta}_0 – hat{beta}_1 x_i)}{k+(y_i – hat{beta}_0 – hat{beta}_1 x_i)^2}, \[6pt] 0 &= sum_{i=1}^n frac{2 x_i (y_i – hat{beta}_0 – hat{beta}_1 x_i)}{k+(y_i – hat{beta}_0 – hat{beta}_1 x_i)^2}. \[6pt] end{aligned} end{equation}

These equations can be solved numerically using iterative techniques such as Newton-Raphson or gradient-descent methods, or other more complicated methods. The exact distribution of the MLE will be complicated, but for large samples the MLE should be normally distributed, according to standard large-sample theory.

Rate this post