I have a dataset with a dependent and an independent variable. Both are not a time series. I have 120 observations. The correlation coefficient is 0.43
After this calculation, I have added a column for both variables with the average for every 12 observations, resulting in 2 new columns with 108 observations (pairs). The correlation coefficient of these columns is 0.77
It seems I improved the correlation in this way. Is this allowed to do? Did I increase the explanation power of the independent variable by using averages?
Best Answer
Let's have a look at two vectors, the first being
2 6 2 6 2 6 2 6 2 6 2 6
and the second vector being
6 2 6 2 6 2 6 2 6 2 6 2
Calculating the Pearson correlation you'll get
cor(a,b) [1] -1
However if you take the average of successive pairs for values both vectors are identical. Identical vectors have correlation 1.
4 4 4 4 4 4
This simple example illustrates a downside of your method.
Edit: To explain it more generally: The correlation coefficient is computed in the following way.
$frac{E[(X-mu_X)(Y-mu_Y)]}{sigma_X sigma_Y}$
Averaging some $X$s and some $Y$s changes the differences between $X$ and $mu_X$ as well as the difference between $Y$ and $mu_Y$.
Similar Posts:
- Solved – Mantel test or/and correlation test
- Solved – Correlation plots with missing values
- Solved – Can regression coefficients be higher than correlation coefficients?
- Solved – Can regression coefficients be higher than correlation coefficients?
- Solved – Correlation between features Pearson vs Spearman