Suppose I have some slightly messy data with an r squared of 0.9, and so fits a line pretty well.
If I were to extrapolate based on the slope and intercept of the fit, I would expect my y values to be pretty close while the x values are close to the range of the data, but as my x values got farther out, I would expect more uncertainty.
What would be the best way to find how big the confidence interval is of y values.
So where 95% of scenarios so that match the same slope, intercept and r squared will pass through?
Best Answer
I'm assuming you are talking about an interval for the observations, rather than for the regression line.
Given the x, the outcome y is assumed to be normally distributed like so
$$ yvert x sim mathcal{N}(hat{beta}_0 + hat{beta}_1 x, hat{sigma}^2) $$
Here, $hat{sigma}^2$ has been estimated from the data. This thread seems to discuss the computation of both prediction and confidence intervals quite well.
In R, it is easy to get prediction intervals
library(tidyverse) x = rnorm(100) xpred = seq(-3,3,0.01) y = 2*x+1+rnorm(length(x), 0, 2) model = lm(y~x) ypred = predict(model, list(x = xpred), interval = 'predict' ) %>%as.data.frame() d = tibble(xpred=xpred) %>% bind_cols(ypred) d %>% ggplot(aes(xpred, fit))+ geom_line()+ geom_ribbon(aes(ymin = lwr, ymax = upr),alpha = 0.5)+ geom_point(data = tibble(x,y), aes(x,y))
Yielding
If instead you want a confidence interval for the regression line, then the variance conditional on x is given by
$$operatorname{Var}(y) = operatorname{Var}(hat{beta}_0) + x^2operatorname{Var}(hat{beta}_1) + 2xoperatorname{Cov}(hat{beta}_0, hat{beta}_1) = mathbf{x}^T Sigma mathbf{x}$$
Here, $mathbf{x} = [1,x]$. Using this, we can apply the standard confidence interval formula. Obtaining confidence intervals in R is the same procedure, except now we pass interval="conf"
to the predict
function. This yields
Note that the the precision is greatest near the sample mean of the x. As you extrapolate more and more, the uncertainty increases as evidenced by the widening of the confidence interval.
Similar Posts:
- Solved – 95% CI for an estimated X given Y in a simple linear regression model
- Solved – Confidence set for parameter vector in linear regression
- Solved – “Studentized” bootstrap confidence interval for variance of OLS error terms
- Solved – Confidence interval for difference of means in regression
- Solved – Confidence interval for difference of means in regression