I have 2 continuous outcome (independent) variables, A and B, and 1 dependent variable (biomarker) that are all very correlated. I would like compare the outcome variables in relation to the biomarker and assess whether the biomarker is able to explain more of A or of B.
Is there a way to statistically compare the Rsqr values of the 2 models for example?
I was also thinking of doing a wald test after a regression using the dependent variable as outcome and the 2 outcome variables as dependent variables; but the units of the outcome variables are different.
I work in STATA, R and SPSS, any type of code could really help.
Thanks!
Best Answer
You can just directly compare the $R^2$s. If you want to see how sensitive the $R^2$s are to the amount of data selected you could do something like Mr. Masterov suggested above. My instict would be to randomly select 25% of the observations and fit models, making a vector of $R^2$s for each of your dep. vars.
Because I am so magnanimous, I've written you a little example. Should give you an idea of what I have in mind. Just tested in R, should work.
# Make some fake data N <- 100000 # our observation count vBioMarker <- rnorm(N) # Now let's make A and B from known generating processes # Note how I make the 'residual' for B have twice the variance of A # this should make the 'true' R^2 different for the two A <- 0.3 * vBioMarker + rnorm(N,0,1) B <- 0.3 * vBioMarker + rnorm(N,0,2) modelData <- data.frame(vBioMarker,A,B) tempAMod <- lm(A~vBioMarker,data=modelData) #check out the R^2 summary(tempAMod)$r.squared # Now generate an estimated sampling dist of the R^2 #I'm going to use a loop here, it is slow, you could speed it up with # on of the apply functions probably. vRsqForA <- NULL vRsqForB <- NULL for(lSample in 1:10000){ # make a vector with 25% of the obs numbers randomly selected vTempSubset <- sample(1:N,0.25*N) # fit the temp model for A tempModA <- lm(A~vBioMarker, data=modelData, subset=vTempSubset) # fit the temp model for B tempModB <- lm(B~vBioMarker, data=modelData, subset=vTempSubset) # Book the R^2s vRsqForA <- c(vRsqForA,summary(tempModA)$r.squared) vRsqForB <- c(vRsqForB,summary(tempModB)$r.squared) } # for each sample run # some quick and dirty histograms, I suggest plotting both histograms on the # same graph using the methods in ggplot2, can't recall how to do that off # the top of my head hist(vRsqForA,breaks=100) hist(vRsqForB,breaks=100)
Similar Posts:
- Solved – relation between $R^2$ of simple regression and multiple regression
- Solved – Does logistic regression determine the direction of the association
- Solved – Interaction in Survival analysis
- Solved – Assessing spurious effect of a third variable on the relationship between a response and a predictor
- Solved – R squared change multiple linear regression