If you can measure a time series of observations at any level of precision in time, and your goal of the study is to identify a relationship between X and Y, is there any empirical justification for choosing a specific level of aggregation over another, or should the choice be simply taken based on theory and/or practical limitations?
I have three sub-questions to this main one:
Is any non-random variation in X or
Y within a larger level sufficient
reasoning to choose a smaller level
of aggregation (where non-random is
any temporal pattern of the
Is any variation in the relationship between X and Y at a smaller level of aggregation sufficient reasoning to justify the smaller unit of analysis? If some variation is acceptable how does one decide how much variation is too much?
Can people cite arguments they feel are compelling/well defined for one unit of analysis over another, either for empirical reasons or for theoretical reasons?
I am well aware of the modifiable area unit problem in spatial analysis (Openshaw 1984). I don't claim to be expert on the material, but all I am to think so far of it is that a smaller unit of analysis is always better, as one is less likely to commit an ecological fallacy (Robinson 1950). If one has a directly pertinent reference or answer concerning aggregation geographical units I would appreciate that answer as well.
My interest in the topic is now about 7 years and resulted in PhD thesis Time series: aggregation, disaggregation and long memory, where attention was paid to a specific question of cross-sectional disaggregation problem for AR(1) scheme.
Working with different approaches to aggregation the first question you need to clarify is what type of data you deal with (my guess is spatial, the most thrilling one). In practice you may consider temporal aggregation (see Silvestrini, A. and Veridas, D. (2008)), cross-sectional (I loved the article by Granger, C. W. J. (1990)) or both time and space (spatial aggregation is nicely surveyed in Giacomini, R. and Granger, C. W. J. (2004)).
Now, answering your questions, I put some rough intuition first. Since the problems I meet in practice are often based on inexact data (Andy's assumption
you can measure a time series of observations at any level of precision in time
seems too strong for macro-econometrics, but good for financial and micro-econometrics or any experimental fields, were you do control the precision quite well) I do have to bear in mind that my monthly time series are less precise than when I work with yearly data. Besides more frequent time series at least in macroeconomics do have seasonal patterns, that may lead to spurious results (seasonal parts do correlate not the series), so you need to seasonally adjust your data – another source of smaller precision for higher frequency data. Working with cross-sectional data revealed that high level of disaggregation brings more problems with probably, lots of zeroes to deal with. For instance, a particular household in the panel of data may purchase a car once per 5-10 years, but aggregated demand for new (used) cars is much smoother (even for a small town or region).
The weakest point aggregation always results in the loss of information, you may have the GDP produced by the cross-section of EU countries during the whole decade (say period of 2001-2010), but you will loose all the dynamic features that may be present in your analysis considering detailed panel data set. Large scale cross-sectional aggregation may turn to be even more interesting: you, roughly, take simple things (short memory AR(1)) average them over the quite large population and get "representative" long memory agent that resembles none of the micro units (one more stone to the representative agent's concept). So aggregation ~ loss of information ~ different properties of the objects and you would like to take control over the level of this loss and/or new properties. In my opinion, it is better to have precise micro level data at as high frequency as possible, but… there is a usual measurement trade-off, you can't be everywhere perfect and precise 🙂
Technically producing any regression analysis you do need more room (degrees of freedom) to be more or less confident that (at least) statistically your results are not junk, though they still may be a-theoretical and junk 🙂 So I do put equal weights to question 1 and 2 (usually choose quarterly data for the macro-analysis). Answering the 3rd sub-question, all you decide in practical applications what is more important to you: more precise data or degrees of freedom. If you take the mentioned assumption into account the more detailed (or higher frequency) data is preferable.
Probably the answer will be edited latter after some sort of discussion if any.
- Solved – Time series as cross-sectional data
- Solved – Time series as cross-sectional data
- Solved – Difference between econometrics and time series analysis
- Solved – name for applying estimation at a lower level of aggregation, and is it necessarily problematic
- Solved – use OLS to analyse Cross-sectional data