# Solved – Aggregate vs. firm-level regressions – How to the regression coefficients differ that much

I recently came across a study that finds the following two results:

• On the firm-level, the independent variable \$X_{it}\$ has a positive impact on the dependent variable \$Y_{it}\$. Concretely, the study shows that the time-series regression coefficient of realized returns on earnings changes (deflated by stock prices) is positive for almost all firms \$i\$. That is, they run a time-series regression for each firm and look at the distribution of the betas. Both the mean and the median are positive, only 10.7% have a negative beta.
• On the aggregate level, they find the inverse relation, i.e. aggregate earnings changes (either value- or equally weighted) have a negative beta on aggregated stock returns.

I am rather surprised by this finding. I tried to search for it online and it first seemed my understanding problem could be solved with the ecological fallacy (see Wikipedia). However, I don't think so anymore.

If I am correct, the mechanism behind the ecological fallacy is a different one. For instance, take the literacy-immigration example from Wikipedia: within each group/state, illiteracy is higher for immigrants, but since immigrants settle in states with higher literacy, on aggregate, there is a negative effect between percentage of immigration and illiteracy. So the effect, if I understand it correctly, occurs because the groups, in this case the states, are different to start with and the immigrants can choose the states. Let's assume that immigrants are randomly sampled to a state. Then this effect wouldn't work, right?

However, in my example, there is no sample selection, at least non that I am aware of since the groups are the time periods. That is, each time period all firms in a sample are aggregated and firms can't really choose the time period.

Of course, it could be that firms are bankrupt in bad states of the economy, so the sample size varies between different periods. But let's ignore that for a second and assume that the number of firms stays constant through the whole sample: How can it be that the regression coefficient is so different on the aggregate in comparison to the distribution of the regression coefficients on the firm level? Both a formal answer and an intuitive one (maybe a small example) would be great.

Contents