As an assumption of linear regression, the normality of the distribution of the error is sometimes wrongly "extended" or interpreted as the need for normality of the y or x.
Is it possible to construct a scenario/dataset that where the X and Y are non-normal but the error term is and therefore the obtained linear regression estimates are valid?
Contents
hide
Best Answer
Expanding on Hong Oois comment with an image. Here is an image of a dataset where none of the marginals are normally distributed but the residuals still are, thus the assumptions of linear regression are still valid:
The image was generated by the following R code:
library(psych) x <- rbinom(100, 1, 0.3) y <- rnorm(length(x), 5 + x * 5, 1) scatter.hist(x, y, correl=F, density=F, ellipse=F, xlab="x", ylab="y")
Similar Posts:
- Solved – Calculating the confidence interval for simple linear regression coefficient estimates
- Solved – Calculating the confidence interval for simple linear regression coefficient estimates
- Solved – Calculating the confidence interval for simple linear regression coefficient estimates
- Solved – Calculating the confidence interval for simple linear regression coefficient estimates
- Solved – What to do first when there are violations of assumption in Simple Regression?