Solved – Comparing 2 regression models

I have 2 continuous outcome (independent) variables, A and B, and 1 dependent variable (biomarker) that are all very correlated. I would like compare the outcome variables in relation to the biomarker and assess whether the biomarker is able to explain more of A or of B.

Is there a way to statistically compare the Rsqr values of the 2 models for example?

I was also thinking of doing a wald test after a regression using the dependent variable as outcome and the 2 outcome variables as dependent variables; but the units of the outcome variables are different.

I work in STATA, R and SPSS, any type of code could really help.


You can just directly compare the $R^2$s. If you want to see how sensitive the $R^2$s are to the amount of data selected you could do something like Mr. Masterov suggested above. My instict would be to randomly select 25% of the observations and fit models, making a vector of $R^2$s for each of your dep. vars.

Because I am so magnanimous, I've written you a little example. Should give you an idea of what I have in mind. Just tested in R, should work.

# Make some fake data   N <- 100000 # our observation count  vBioMarker <- rnorm(N)  # Now let's make A and B from known generating processes   # Note how I make the 'residual' for B have twice the variance of A # this should make the 'true' R^2 different for the two  A <- 0.3 * vBioMarker + rnorm(N,0,1)  B <- 0.3 * vBioMarker + rnorm(N,0,2)  modelData <- data.frame(vBioMarker,A,B)   tempAMod <- lm(A~vBioMarker,data=modelData)  #check out the R^2 summary(tempAMod)$r.squared  # Now generate an estimated sampling dist of the R^2  #I'm going to use a loop here, it is slow, you could speed it up with # on of the apply functions probably.  vRsqForA <- NULL  vRsqForB <- NULL  for(lSample in 1:10000){    # make a vector with 25% of the obs numbers randomly selected    vTempSubset <- sample(1:N,0.25*N)    # fit the temp model for A    tempModA <- lm(A~vBioMarker,              data=modelData,              subset=vTempSubset)   # fit the temp model for B    tempModB <- lm(B~vBioMarker,              data=modelData,              subset=vTempSubset)   # Book the R^2s     vRsqForA <- c(vRsqForA,summary(tempModA)$r.squared)    vRsqForB <- c(vRsqForB,summary(tempModB)$r.squared)   } # for each sample run  # some quick and dirty histograms, I suggest plotting both histograms on the  # same graph using the methods in ggplot2, can't recall how to do that off  # the top of my head  hist(vRsqForA,breaks=100)  hist(vRsqForB,breaks=100) 

Similar Posts:

Rate this post

Leave a Comment