Solved – the main difference between multiple R-squared and correlation coefficient

I have a R code that runs lm function and get the summary.
Summary

What is the meaning of multiple R squared?
And is there any relationship between multiple R-squared and correlation coefficient?
Then how can we explain between two variables using multiple R squared?

In general, these four quantities are the same giving three different ways to think about R squared.

  • multiple R squared
  • the variance of the fitted values divided by the variance of the dependent variable (i.e. the regression "explains" that proportion of the variance of the dependent variable)
  • the square of the correlation between the fitted values and the dependent variable
  • the proportional improvement in residual sum of squares from the null model to the fitted model. (The null model has the same dependent variable but only an intercept with no predictors.)

We show an R example where X1 and X2 are the predictors, Y is the dependent variable, fm is the regression object and fitted(fm) are the fitted values. (The fitted values are the values of Y predicted by the right hand side of the regression equation.) fm_null is the regression object of the null regression model (as described above) and deviance(fm) refers to the residual sum of squares of the indicated model.

X1 <- c(1, 1, 1, 1, 2, 2) X2 <- c(1, 2, 3, 1, 2, 3) Y <- c(1, 2, 3, 4, 5, 6) fm <- lm(Y ~ X1 + X2) # regression of Y on X1 and X2 fm_null <- lm(Y ~ 1)  # null model  # these are the same  var(fitted(fm)) / var(Y) # ratio of variances ## [1] 0.7032967  cor(fitted(fm), Y)^2 # squared correlation ## [1] 0.7032967  summary(fm)$r.squared # multiple R squared ## [1] 0.7032967  1 - deviance(fm) / deviance(fm_null)  # improvement in residual sum of squares ## [1] 0.7032967 

Single Predictor

We can repeat this with the data used in the question which has the special feature that there is only a single predictor:

library(MASS)  fm <- lm(bwt ~ lwt, birthwt) fm_null <- lm(bwt ~ 1, birthwt)  var(fitted(fm)) / var(birthwt$bwt) # ratio of variances ## [1] 0.03449685  cor(fitted(fm), birthwt$bwt)^2 # squared correlation ## [1] 0.03449685  summary(fm)$r.squared # multiple R squared ## [1] 0.03449685  1 - deviance(fm) / deviance(fm_null)  # improvement in residual sum of squares ## [1] 0.03449685 

In the special case of a single predictor we have additional equalities which we can add to the above list:

  • the square of the correlation between the dependent variable and that predictor equals each of the above.

  • the coefficient of the predictor times the ratio of the standard deviations of the dependent variable to the predictor equals the correlation so the square of all that equals the multiple R squared.

  • if we standardize the dependent variable and predictor to each have mean 0 and standard deviation 1 then the coefficient of the predictor equals the correlation between the dependent variable and the predictor so its square equals multiple R squared.

Thus we have these three additional ways to view R squared (6 in total) in the case of a single predictor.

cor(birthwt$lwt, birthwt$bwt)^2 # squared correlation ## [1] 0.03449685  (coef(fm)[[2]] * sd(birthwt$lwt) / sd(birthwt$bwt))^2 ## [1] 0.03449685  fm0 <- lm(bwt ~ lwt, as.data.frame(scale(birthwt))) coef(fm0)[[2]]^2 ## [1] 0.03449685  

Similar Posts:

Rate this post

Leave a Comment