Say I have a $n$ pairs $(X_i, Y_i)$ and I have the relationship $$Y_i = beta_0 + beta_1X_i +epsilon_i$$ where the error terms $epsilon_i$ are iid from a student t-distribution with constant (known) degrees of freedom, (say $kinmathbb{N}$ degrees of freedom, where $k > 2$). If I wanted to compute the the MLE for $beta_0, beta_1$, how do I relate the given distribution to the parameters of interest?

Sorry if this is trivial, but I searched around and haven't really seen an example like this.

**Contents**hide

#### Best Answer

The MLE is obtained by maximising the log-likelihood function, so the first thing you will want to do is have a look at this function. Using the density function for the Student T-distribution with **known** degrees-of-freedom $k in mathbb{N}$, you can write the log-likelihood as:

$$ell_mathbf{x,y} (beta_0, beta_1) = – sum_{i=1}^n ln Big( k + (y_i – beta_0 – beta_1 x_i)^2 Big).$$

The residuals $r_i = y_i – hat{beta}_0 – hat{beta}_1 x_i$ under the MLE minimise $sum_{i=1}^n ln ( k + r_i^2 )$. As $k rightarrow infty$ we have $ln ( k + r_i^2 ) = ln(k) + ln(1+r_i^2/k) approx ln(k) + r_i^2/k$ so that the residuals minimise $sum_{i=1}^n r_i^2$ in the limit, which is the standard OLS solutions for normally distributed errors. Use of the T-distribution effectively dampens the effect of large residuals, through the above logarithmic transformation, and so the MLE is more tolerant of having some large residuals than in the normal case.

**Finding the MLE:** The MLE can be obtained via numerical maximisation of the log-likelihood using ordinary calculus techniques. The gradient of the log-likelihood is given by the partial derivatives:

$$begin{equation} begin{aligned} frac{partial ell_mathbf{x,y}}{partial beta_0}(beta_0, beta_1) &= sum_{i=1}^n frac{2 (y_i – beta_0 – beta_1 x_i)}{k+(y_i – beta_0 – beta_1 x_i)^2}, \[6pt] frac{partial ell_mathbf{x,y}}{partial beta_1}(beta_0, beta_1) &= sum_{i=1}^n frac{2 x_i (y_i – beta_0 – beta_1 x_i)}{k+(y_i – beta_0 – beta_1 x_i)^2}. \[6pt] end{aligned} end{equation}$$

This leads to the score equations:

$$begin{equation} begin{aligned} 0 &= sum_{i=1}^n frac{2 (y_i – hat{beta}_0 – hat{beta}_1 x_i)}{k+(y_i – hat{beta}_0 – hat{beta}_1 x_i)^2}, \[6pt] 0 &= sum_{i=1}^n frac{2 x_i (y_i – hat{beta}_0 – hat{beta}_1 x_i)}{k+(y_i – hat{beta}_0 – hat{beta}_1 x_i)^2}. \[6pt] end{aligned} end{equation}$$

These equations can be solved numerically using iterative techniques such as Newton-Raphson or gradient-descent methods, or other more complicated methods. The exact distribution of the MLE will be complicated, but for large samples the MLE should be normally distributed, according to standard large-sample theory.

### Similar Posts:

- Solved – In linear regression, are the noise terms independent of the coefficient estimators
- Solved – expected value of a score function (the gradient of the log-likelihood function)
- Solved – Sufficient Statistic for $beta$ in OLS
- Solved – Sufficient Statistic for $beta$ in OLS
- Solved – Hessian matrix for maximum likelihood