Solved – Calculate correlation for discrete-like values from two columns of DataFrame in Pandas

Here's the code snippet:

df = pd.DataFrame(data=[1,1,2,2,3,3,3], columns =list('A'))  def m(x):     if x == 1:         return 2     if x == 2:         return 3     if x == 3:         return 1     return -1 df['B'] = df['A'].map(m) print df.head(n=10)     A  B 0  1  2 1  1  2 2  2  3 3  2  3 4  3  1 5  3  1 6  3  1 

As we can see, column B is created by mapping value from column A, thus they should have correlation of value 1, but what I got from below is all not satisfying. Could anyone give me some idea on how to calculate correlation of discrete data for two columns? Great thanks!

df['A'].cov(df['B']) -0.47619047619047611 df['A'].corr(df['B'], method='spearman') -0.68000000000000016 df['A'].corr(df['B'], method='kendall') -0.50000000000000011 df['A'].corr(df['B']) -0.58823529411764708 

There is nothing wrong in your calculation. However, your mapping is not linear and therefore correlation between your variables is not 1 nor -1.

I suggest trying mapping 3 to 4 instead of 1 and compute correlation again. Then you should get correlation = 1.

For a different test, mapping 1 to 3, 2 to 2 and 3 to 1 should produce correlation = -1.

And please notice that correlation is just a measure on how much the variables are linearly related. If they are related by a deterministic mapping but that mapping is not linear, correlation will be low.

Similar Posts:

Rate this post

Leave a Comment