I currently have a data frame with 98 observations and 107 variables. All of the variables are numeric, but one variable is binary (yes or no). My goal is to determine which correlation and/or variable give the greatest segregation between the yes and no samples. I have been using the pairs () function to do this, but I can only do a few variables at a time. Is there a way to determine which correlation gives the greatest discernment between yes and no?

To Clarify – My table is 98 observations and 107 variables, but doing a correlation matrix with the pairs function is not able to fit all of the variables.

I have used this function:

`pairs(x[70:80], ch=21, bg=c("red","green")[unclass(x$outcome)]) `

**Contents**hide

#### Best Answer

When you have **multiple variable** and you are looking for variable(s) **which is the best for discriminating between groups** ("yes" and "no" samples in this case) a tool for this is **MANOVA**.

`# Suppose we have a data.frame with 7 variables and one group: my.data<-data.frame(v1=rnorm(100),v2=rnorm(100),v3=rnorm(100), v4=rnorm(100),v5=rnorm(100),v6=rnorm(100), v7=c(rnorm(50), rnorm(50)+20),response=rep(c("yes","no"), each=50)) # run MANOVA my.mnv<-manova(cbind(v1,v2,v3,v4,v5,v6,v7) ~ response, data=my.data) # and look on p-values (if p-value < 0.05 then it is able to # significantly discriminate between "yes" and "no") summary.aov(my.mnv) # plot pairs(my.data[c("v1","v2","v3","v4","v5","v6","v7")], pch=22, bg=c("red", "yellow")[unclass(my.data$response)]) `

**It's not good to make conclusions about statistical significance based on looking on the plot** (although it is necessary to look on it). In you case of 107 variables the `pairs()`

plot will be very chaotic.