I have a R code that runs lm
function and get the summary.
What is the meaning of multiple R squared?
And is there any relationship between multiple R-squared and correlation coefficient?
Then how can we explain between two variables using multiple R squared?
Best Answer
In general, these four quantities are the same giving three different ways to think about R squared.
- multiple R squared
- the variance of the fitted values divided by the variance of the dependent variable (i.e. the regression "explains" that proportion of the variance of the dependent variable)
- the square of the correlation between the fitted values and the dependent variable
- the proportional improvement in residual sum of squares from the null model to the fitted model. (The null model has the same dependent variable but only an intercept with no predictors.)
We show an R example where X1 and X2 are the predictors, Y is the dependent variable, fm is the regression object and fitted(fm) are the fitted values. (The fitted values are the values of Y predicted by the right hand side of the regression equation.) fm_null is the regression object of the null regression model (as described above) and deviance(fm) refers to the residual sum of squares of the indicated model.
X1 <- c(1, 1, 1, 1, 2, 2) X2 <- c(1, 2, 3, 1, 2, 3) Y <- c(1, 2, 3, 4, 5, 6) fm <- lm(Y ~ X1 + X2) # regression of Y on X1 and X2 fm_null <- lm(Y ~ 1) # null model # these are the same var(fitted(fm)) / var(Y) # ratio of variances ## [1] 0.7032967 cor(fitted(fm), Y)^2 # squared correlation ## [1] 0.7032967 summary(fm)$r.squared # multiple R squared ## [1] 0.7032967 1 - deviance(fm) / deviance(fm_null) # improvement in residual sum of squares ## [1] 0.7032967
Single Predictor
We can repeat this with the data used in the question which has the special feature that there is only a single predictor:
library(MASS) fm <- lm(bwt ~ lwt, birthwt) fm_null <- lm(bwt ~ 1, birthwt) var(fitted(fm)) / var(birthwt$bwt) # ratio of variances ## [1] 0.03449685 cor(fitted(fm), birthwt$bwt)^2 # squared correlation ## [1] 0.03449685 summary(fm)$r.squared # multiple R squared ## [1] 0.03449685 1 - deviance(fm) / deviance(fm_null) # improvement in residual sum of squares ## [1] 0.03449685
In the special case of a single predictor we have additional equalities which we can add to the above list:
the square of the correlation between the dependent variable and that predictor equals each of the above.
the coefficient of the predictor times the ratio of the standard deviations of the dependent variable to the predictor equals the correlation so the square of all that equals the multiple R squared.
if we standardize the dependent variable and predictor to each have mean 0 and standard deviation 1 then the coefficient of the predictor equals the correlation between the dependent variable and the predictor so its square equals multiple R squared.
Thus we have these three additional ways to view R squared (6 in total) in the case of a single predictor.
cor(birthwt$lwt, birthwt$bwt)^2 # squared correlation ## [1] 0.03449685 (coef(fm)[[2]] * sd(birthwt$lwt) / sd(birthwt$bwt))^2 ## [1] 0.03449685 fm0 <- lm(bwt ~ lwt, as.data.frame(scale(birthwt))) coef(fm0)[[2]]^2 ## [1] 0.03449685
Similar Posts:
- Solved – the main difference between multiple R-squared and correlation coefficient
- Solved – calculating a pseudo R2 value when deviance is negative
- Solved – Use squared correlation in regression without intercept
- Solved – Use squared correlation in regression without intercept
- Solved – What if explained deviance is greater than 1.0 (or 100%)