I am trying to understand the difference between iid and non-iid data.
Let's consider a given time series, and say it's reasonable to assume that at each time point the random variable $X_t$ depends on $X_{t-1}$.
Now say someone gives me the dataset $ D = { ( t_i, x_i ) } $ for $i=1 dots n$. I might be tempted to apply some regression method, which usually assumes that the corresponding pairs of random variables $(T_i, X_i)$ are iid.
Would that be a bad idea if we really believe that $X_{t}$ depends on $X_{t-1}$?
Best Answer
Yes, it would be a bad idea. Doing a regression on dependent data without dealing with that dependence can yield silly results.
For one thing, if two variables both depend on time, they will be correlated if you don't account for time: The Dow Jones Average is correlated with the number of people in China, just for an example.
There are lots of models for time series; it's not my area of expertise, but ARIMA and ARIMAX are two that may be worth considering.
Similar Posts:
- Solved – Example Transforming A time series using the Backshift operator
- Solved – Time series and random variable
- Solved – Pros and Cons: Methods for Detrending Time Series Data
- Solved – Pros and Cons: Methods for Detrending Time Series Data
- Solved – For a time series problem, Why is it preferrable to use a time series model over a model without an explicit time component