I ran a correlation in a binary classification problem between one variable and the prediction to determine whether a linear correlation exists. The code:
a = [340, 180, 50, 30, 100, 300, 195, 20, 60, 80, 380] # feature b = [1, 1 , 0 , 0, 0, 1, 1, 0, 0, 0, 1] # class from scipy.stats import pearsonr print(pearsonr(a,b))
It seems there is high correlation (over 75%) and p-value is very significant as well. I then tried to create a scatterplot with matplotlib in order to examine the data but obviously a simple scatterplot (x-axis = class and y-axis – features) doesn't show much.
@xan Thanks for the response!
I was trying to say that it is my understanding that Pearson correlation can only illustrate a linear relationship. I was wondering if there is anyway of visualising a non-linear correlation.The first graph is boring as the data is perfectly split.The problem i'm examining is a doc classification issue, the variable is doc length. Im want to verify that a relationship exists (which I did) and verify that doc length isn't part of the model (normalisated tfidf).I like your idea of the fitted LR, perhaps that would be an interesting plot to show the correlation!
Best Answer
I don't follow all of your question, but it sounds like you're looking for a plot that is often used for the opposite problem: seeing how a class predicts a feature. JMP uses a means diamond plot by default for that question:
The width is proportion to the counts and the heights reflect the confidence intervals.
To show how the probably changes with the feature, you can plot the fitted logistic regression with points jittered to show density (vertical position is meaningless except for being above or below the curve).
It's pretty boring for your data which makes a perfect split, but if I add in a couple misclassifications, it gets more useful.
Similar Posts:
- Solved – How to visualize features in case of classification problem
- Solved – Plotting Scatterplot Matrix or Correlation matrix or both
- Solved – How best to graphically represent an r-square effect-size measure
- Solved – Pearson correlation to a uniformly distributed dataset
- Solved – do beyond Pearson correlation