I have a dataset with a dependent and an independent variable. Both are not a time series. I have 120 observations. The correlation coefficient is 0.43

After this calculation, I have added a column for both variables with the average for every 12 observations, resulting in 2 new columns with 108 observations (pairs). The correlation coefficient of these columns is 0.77

It seems I improved the correlation in this way. Is this allowed to do? Did I increase the explanation power of the independent variable by using averages?

**Contents**hide

#### Best Answer

Let's have a look at two vectors, the first being

` 2 6 2 6 2 6 2 6 2 6 2 6 `

and the second vector being

` 6 2 6 2 6 2 6 2 6 2 6 2 `

Calculating the Pearson correlation you'll get

`cor(a,b) [1] -1 `

However if you take the average of successive pairs for values both vectors are identical. Identical vectors have correlation 1.

` 4 4 4 4 4 4 `

This simple example illustrates a downside of your method.

**Edit**: To explain it more generally: The correlation coefficient is computed in the following way.

$frac{E[(X-mu_X)(Y-mu_Y)]}{sigma_X sigma_Y}$

Averaging some $X$s and some $Y$s changes the differences between $X$ and $mu_X$ as well as the difference between $Y$ and $mu_Y$.

### Similar Posts:

- Solved – Mantel test or/and correlation test
- Solved – Correlation plots with missing values
- Solved – Can regression coefficients be higher than correlation coefficients?
- Solved – Can regression coefficients be higher than correlation coefficients?
- Solved – Correlation between features Pearson vs Spearman