Solved – How tollustrate correlation in binary classification

I ran a correlation in a binary classification problem between one variable and the prediction to determine whether a linear correlation exists. The code:

a = [340, 180, 50, 30, 100, 300, 195, 20, 60, 80, 380] # feature b = [1, 1 , 0 , 0, 0, 1, 1, 0, 0, 0, 1] # class from scipy.stats import pearsonr print(pearsonr(a,b)) 

It seems there is high correlation (over 75%) and p-value is very significant as well. I then tried to create a scatterplot with matplotlib in order to examine the data but obviously a simple scatterplot (x-axis = class and y-axis – features) doesn't show much.

@xan Thanks for the response!

I was trying to say that it is my understanding that Pearson correlation can only illustrate a linear relationship. I was wondering if there is anyway of visualising a non-linear correlation.The first graph is boring as the data is perfectly split.The problem i'm examining is a doc classification issue, the variable is doc length. Im want to verify that a relationship exists (which I did) and verify that doc length isn't part of the model (normalisated tfidf).I like your idea of the fitted LR, perhaps that would be an interesting plot to show the correlation!

I don't follow all of your question, but it sounds like you're looking for a plot that is often used for the opposite problem: seeing how a class predicts a feature. JMP uses a means diamond plot by default for that question:

enter image description here

The width is proportion to the counts and the heights reflect the confidence intervals.

To show how the probably changes with the feature, you can plot the fitted logistic regression with points jittered to show density (vertical position is meaningless except for being above or below the curve).

enter image description here

It's pretty boring for your data which makes a perfect split, but if I add in a couple misclassifications, it gets more useful.

enter image description here

Similar Posts:

Rate this post

Leave a Comment