Solved – How to Estimate the Error Term in a Heteroscedastic Model with Regression Through the Origin

Suppose we have a NO INTERCEPT model, $$y_i=beta x_i+e_i$$

where $e_{i}$ follows a N(0,$sigma^2 x_i^h$), so $e_i$ is equal in distribution to $e_{0i} x_i^{frac{h}{2}}$, where $e_{0i}$ follows a N(0,$sigma^2$) .

I know that in order to obtain the BLUE (Best Linear Unbiased Estimator) of the $beta$ coefficient, set $w_i = x_i^{-frac{h}{2}}$.

$h$ can be estimated by calculating the marginal variances at each level of x. Then, using the model $log(hat{e}_i^2)=log(sigma^2)+hlog(X)$ which yields $hat{h}$, so $w_i = x_i^{-frac{hat{h}}{2}}$ .

This then gives the model $$y_i w_i = beta x_i w_i + e_{0i}$$ which gives $hat{beta }=frac{sum_{i=1}^{n}w_i^2 x_i y_i}{sum_{i=1}^{n}w_i^2 x_i^2}$.

I am looking to find $Var(hat{beta })=frac{sum_{i=1}^{n}w_i^2 x_i^2 }{(sum_{i=1}^{n}w_i^2 x_i^2)^2}Var(w_i y_i)=sigma^2 frac{sum_{i=1}^{n}w_i^2 x_i^2 }{(sum_{i=1}^{n}w_i^2 x_i^2)^2}$, but I'm not completely sure how to correctly estimate $sigma^2$.

I am estimating $sigma^2$ using the usual $s^2$ formula with the transformed data, $frac{1}{n-1}sum_{i=1}^{n}(w_i y_i -hat{beta}w_i x_i)^2$. Is this correct? Should I be dividing by n-2 because I've estimated TWO parameters ($beta$ and $h$)?

First off I assume you mean $e_i sim N(0,sigma^2|x_i|^h)$ or that $x_i$ is always positive. Otherwise you will get negative and or complex valued variances.

In practice, when we know heteroskedasticity exists but are unaware of its form we may use the two-step method of Feasible generalized least squares (FGLS). However FGLS may be inconsistent. So to @whuber 's point, given you know something about the heteroskedasticity structure, you can construct a consistent maximum likelihood estimator (MLE) for your parameters. The likelihood function would have the form $$ L=prod_{i=1}^{n} frac{1}{sigma |x_i|^{h/2}}phi bigg(frac{y_i-x_i'beta}{sigma |x_i|^{h/2}}bigg) $$ Where $phi$ is the standard normal pdf.
Your optimal parameter estimates may then be found by maximizing the log likelihood i.e $$ (hat beta,hat sigma, hat h)=arg max_{beta,sigma,h}bigg{sum_{i=1}^{n}-ln(sigma |x_i|^{h/2})+lnbigg(phi bigg(frac{y_i-x_i'beta}{sigma |x_i|^{h/2}}bigg)bigg) bigg} $$ Which simplifies too $$ (hat beta,hat sigma, hat h)=arg max_{beta,sigma,h}bigg{-frac{n}{2}ln(2pi)-sum_{i=1}^{n}lnbigg(sigma |x_i|^{h/2}bigg)+frac{(y_i-x_i'beta)^2}{sigma^2|x_i|^h}bigg} $$

This is extremely useful as you can construct a hypothesis test for homoskedasticity. $$ H_0: h=0 $$ $$ H_a: hneq 0 $$ Where the null is homoskedasticity.

$Var(hat beta)$ may be estimated using the fisher information equality, the negative inverse of the hessian of the above log likelihood evaluated at $(hat beta, hat sigma, hat h)$. These estimates may be obtained using built in optimization and maximum likelihood routines for R, Matlab, Python, etc. Below is an R example.

n=1e3 x=rnorm(n) y=2*x+rnorm(n,sd=.1) a.x=abs(x) b.ols=solve(x%*%x)%*%t(x)%*%y e=y-x%*%b.ols var.ols=sum(e^2)/(n-1) negative.log.like=function(pars) {     b=pars[1]     h=pars[2]     sig.sqr=pars[3]^2     e=(y-x*b)     s=sqrt(sig.sqr)*a.x^(h/2)     e.adj=e/s      OUT=(-sum(log(1/s)+dnorm(e.adj,log=TRUE)))     #print(c(OUT))     return(OUT) } init=c(b.ols,0,var.ols) result=optim(init,negative.log.like,hessian=TRUE) # estimates   result$par     #standard errors      sqrt(diag(solve(result$hessian))) # maximum log likelihood  -result$value 

Similar Posts:

Rate this post

Leave a Comment