# Solved – Pearson correlation after aggregation

The following table represents values about a variable Y observed in five people; I know the age of each person.

``AGE   Y 10    50 10    29 20    30 20    33 30    15 ``

If I measure the Pearson correlation between Y and age I get -0.7792. Y seems to be negatively correlated to Age. If I first aggregate data based on age:

``AGE   Y 10    39.5 20    31.5 30    15 ``

The correlation changes to -0.9805

In the real example I am working on (5k data points), the change is even bigger, from -0.19 to -0.69 so aggregating data completely changes the interpretation of the study. My questions are:

1) How do you interpret this huge difference?

2) Does measuring correlation on aggregate data make sense in this case? And if not, since sometime we don't have access to the single data points but just to the aggregated (averaged) data, what conclusions could we draw from a correlation analysis?

I am reading this papers "The Effects of Data Aggregation in Statistical Analysis" http://onlinelibrary.wiley.com/doi/10.1111/j.1538-4632.1976.tb00549.x/pdf
but my questions are still unanswered.

Contents