Solved – Finding the the Confidence Interval with Linear Extrapolation

Suppose I have some slightly messy data with an r squared of 0.9, and so fits a line pretty well.

If I were to extrapolate based on the slope and intercept of the fit, I would expect my y values to be pretty close while the x values are close to the range of the data, but as my x values got farther out, I would expect more uncertainty.

What would be the best way to find how big the confidence interval is of y values.

So where 95% of scenarios so that match the same slope, intercept and r squared will pass through?

I'm assuming you are talking about an interval for the observations, rather than for the regression line.

Given the x, the outcome y is assumed to be normally distributed like so

$$ yvert x sim mathcal{N}(hat{beta}_0 + hat{beta}_1 x, hat{sigma}^2) $$

Here, $hat{sigma}^2$ has been estimated from the data. This thread seems to discuss the computation of both prediction and confidence intervals quite well.

In R, it is easy to get prediction intervals

library(tidyverse)   x = rnorm(100) xpred = seq(-3,3,0.01) y = 2*x+1+rnorm(length(x), 0, 2) model = lm(y~x) ypred = predict(model, list(x = xpred), interval = 'predict' ) %>%as.data.frame()  d = tibble(xpred=xpred) %>% bind_cols(ypred)   d %>%    ggplot(aes(xpred, fit))+   geom_line()+   geom_ribbon(aes(ymin = lwr, ymax = upr),alpha = 0.5)+   geom_point(data = tibble(x,y), aes(x,y)) 

Yielding

enter image description here

If instead you want a confidence interval for the regression line, then the variance conditional on x is given by

$$operatorname{Var}(y) = operatorname{Var}(hat{beta}_0) + x^2operatorname{Var}(hat{beta}_1) + 2xoperatorname{Cov}(hat{beta}_0, hat{beta}_1) = mathbf{x}^T Sigma mathbf{x}$$

Here, $mathbf{x} = [1,x]$. Using this, we can apply the standard confidence interval formula. Obtaining confidence intervals in R is the same procedure, except now we pass interval="conf" to the predict function. This yields

enter image description here

Note that the the precision is greatest near the sample mean of the x. As you extrapolate more and more, the uncertainty increases as evidenced by the widening of the confidence interval.

Similar Posts:

Rate this post

Leave a Comment