# Solved – How to Estimate the Error Term in a Heteroscedastic Model with Regression Through the Origin

Suppose we have a NO INTERCEPT model, \$\$y_i=beta x_i+e_i\$\$

where \$e_{i}\$ follows a N(0,\$sigma^2 x_i^h\$), so \$e_i\$ is equal in distribution to \$e_{0i} x_i^{frac{h}{2}}\$, where \$e_{0i}\$ follows a N(0,\$sigma^2\$) .

I know that in order to obtain the BLUE (Best Linear Unbiased Estimator) of the \$beta\$ coefficient, set \$w_i = x_i^{-frac{h}{2}}\$.

\$h\$ can be estimated by calculating the marginal variances at each level of x. Then, using the model \$log(hat{e}_i^2)=log(sigma^2)+hlog(X)\$ which yields \$hat{h}\$, so \$w_i = x_i^{-frac{hat{h}}{2}}\$ .

This then gives the model \$\$y_i w_i = beta x_i w_i + e_{0i}\$\$ which gives \$hat{beta }=frac{sum_{i=1}^{n}w_i^2 x_i y_i}{sum_{i=1}^{n}w_i^2 x_i^2}\$.

I am looking to find \$Var(hat{beta })=frac{sum_{i=1}^{n}w_i^2 x_i^2 }{(sum_{i=1}^{n}w_i^2 x_i^2)^2}Var(w_i y_i)=sigma^2 frac{sum_{i=1}^{n}w_i^2 x_i^2 }{(sum_{i=1}^{n}w_i^2 x_i^2)^2}\$, but I'm not completely sure how to correctly estimate \$sigma^2\$.

I am estimating \$sigma^2\$ using the usual \$s^2\$ formula with the transformed data, \$frac{1}{n-1}sum_{i=1}^{n}(w_i y_i -hat{beta}w_i x_i)^2\$. Is this correct? Should I be dividing by n-2 because I've estimated TWO parameters (\$beta\$ and \$h\$)?

Contents

First off I assume you mean \$e_i sim N(0,sigma^2|x_i|^h)\$ or that \$x_i\$ is always positive. Otherwise you will get negative and or complex valued variances.

In practice, when we know heteroskedasticity exists but are unaware of its form we may use the two-step method of Feasible generalized least squares (FGLS). However FGLS may be inconsistent. So to @whuber 's point, given you know something about the heteroskedasticity structure, you can construct a consistent maximum likelihood estimator (MLE) for your parameters. The likelihood function would have the form \$\$ L=prod_{i=1}^{n} frac{1}{sigma |x_i|^{h/2}}phi bigg(frac{y_i-x_i'beta}{sigma |x_i|^{h/2}}bigg) \$\$ Where \$phi\$ is the standard normal pdf.
Your optimal parameter estimates may then be found by maximizing the log likelihood i.e \$\$ (hat beta,hat sigma, hat h)=arg max_{beta,sigma,h}bigg{sum_{i=1}^{n}-ln(sigma |x_i|^{h/2})+lnbigg(phi bigg(frac{y_i-x_i'beta}{sigma |x_i|^{h/2}}bigg)bigg) bigg} \$\$ Which simplifies too \$\$ (hat beta,hat sigma, hat h)=arg max_{beta,sigma,h}bigg{-frac{n}{2}ln(2pi)-sum_{i=1}^{n}lnbigg(sigma |x_i|^{h/2}bigg)+frac{(y_i-x_i'beta)^2}{sigma^2|x_i|^h}bigg} \$\$

This is extremely useful as you can construct a hypothesis test for homoskedasticity. \$\$ H_0: h=0 \$\$ \$\$ H_a: hneq 0 \$\$ Where the null is homoskedasticity.

\$Var(hat beta)\$ may be estimated using the fisher information equality, the negative inverse of the hessian of the above log likelihood evaluated at \$(hat beta, hat sigma, hat h)\$. These estimates may be obtained using built in optimization and maximum likelihood routines for R, Matlab, Python, etc. Below is an R example.

``n=1e3 x=rnorm(n) y=2*x+rnorm(n,sd=.1) a.x=abs(x) b.ols=solve(x%*%x)%*%t(x)%*%y e=y-x%*%b.ols var.ols=sum(e^2)/(n-1) negative.log.like=function(pars) {     b=pars     h=pars     sig.sqr=pars^2     e=(y-x*b)     s=sqrt(sig.sqr)*a.x^(h/2)     e.adj=e/s      OUT=(-sum(log(1/s)+dnorm(e.adj,log=TRUE)))     #print(c(OUT))     return(OUT) } init=c(b.ols,0,var.ols) result=optim(init,negative.log.like,hessian=TRUE) # estimates   result\$par     #standard errors      sqrt(diag(solve(result\$hessian))) # maximum log likelihood  -result\$value ``

Rate this post