# Solved – How to identify variable (from many variables) which is able to discriminate between groups

I currently have a data frame with 98 observations and 107 variables. All of the variables are numeric, but one variable is binary (yes or no). My goal is to determine which correlation and/or variable give the greatest segregation between the yes and no samples. I have been using the pairs () function to do this, but I can only do a few variables at a time. Is there a way to determine which correlation gives the greatest discernment between yes and no?

To Clarify – My table is 98 observations and 107 variables, but doing a correlation matrix with the pairs function is not able to fit all of the variables.

I have used this function:

``pairs(x[70:80], ch=21, bg=c("red","green")[unclass(x\$outcome)]) ``
Contents

``# Suppose we have a data.frame with 7 variables and one group: my.data<-data.frame(v1=rnorm(100),v2=rnorm(100),v3=rnorm(100), v4=rnorm(100),v5=rnorm(100),v6=rnorm(100), v7=c(rnorm(50), rnorm(50)+20),response=rep(c("yes","no"), each=50))  # run MANOVA my.mnv<-manova(cbind(v1,v2,v3,v4,v5,v6,v7) ~ response, data=my.data)  # and look on p-values (if p-value < 0.05 then it is able to  # significantly discriminate between "yes" and "no") summary.aov(my.mnv)  # plot pairs(my.data[c("v1","v2","v3","v4","v5","v6","v7")], pch=22, bg=c("red", "yellow")[unclass(my.data\$response)]) ``
It's not good to make conclusions about statistical significance based on looking on the plot (although it is necessary to look on it). In you case of 107 variables the `pairs()` plot will be very chaotic.