# Solved – Using Pearson’s correlation coefficient for probability

After a sample size of 400+ I was able to get a Pearson's coefficient of .25. How am I supposed to break this down into a probability or a percentage. Rather, how can I explain my findings in laymen terms?

I should give some more information. We have two different tests. One of these tests has 1 question, How satisfied are you? It is scored between a 1-4, with 4 being the most satisfied and 1 being the lowest. The other test has 18 questions. These were an internal review. The review consisted of such questions (did the customer service rep use the customer's name, did the customer service rep give the correct technical answer, did the rep teach the customer how to fix the problem themselves if applicable).

The first test is asked to the customer, the second test with its 18 different questions is filled out by the supervisor. They either pass (-1), Not Applicable or Neither Pass/Fail (0), or Fail (1).

Our goal is to find which variables in the second test best predict scores in the first test. We want to do this to find the problem areas that we can fix internally in order to better serve our customer and give them the best experience possible (More 3's and 4's on the first test).

Contents

You can take \$R^2 = 0.0625\$ and conclude that 6.25% of the variation in one variable are explained by the variation in the other variable. This is done in regression, as a hint to further reading.

In regression, you model the outcomes of each variable as a function of the other variable and an unknown parameter (the regression equality) plus an error term. The unknown parameter will be estimated by this regression model. It basically estimates the parameter such that the sum of the squared residuals is minimized.

How precise this model explains the variation among the dependent variable is usually indicated by \$R^2\$ which can be interpreted just the way I wrote. In case of linear regression with one independent variable, this \$R^2\$ is identical to the squared correlation coefficient, because \$R^2\$ is generally defined as the (variance of the dependent variable – residuals)/(variance of the observed dependent variable).

Note that linear regression might not be the appropriate analysis of your data.

Edit: After Carl's last edit, I see that usual regression and Pearson-correlation is not appropriate. Instead, something from nonparametric regression might be a better choice.

Rate this post