Solved – Difference between iid data and non-iid data for a simple regression problem

I am trying to understand the difference between iid and non-iid data.

Let's consider a given time series, and say it's reasonable to assume that at each time point the random variable $X_t$ depends on $X_{t-1}$.

Now say someone gives me the dataset $ D = { ( t_i, x_i ) } $ for $i=1 dots n$. I might be tempted to apply some regression method, which usually assumes that the corresponding pairs of random variables $(T_i, X_i)$ are iid.

Would that be a bad idea if we really believe that $X_{t}$ depends on $X_{t-1}$?

Yes, it would be a bad idea. Doing a regression on dependent data without dealing with that dependence can yield silly results.

For one thing, if two variables both depend on time, they will be correlated if you don't account for time: The Dow Jones Average is correlated with the number of people in China, just for an example.

There are lots of models for time series; it's not my area of expertise, but ARIMA and ARIMAX are two that may be worth considering.

Similar Posts:

Rate this post

Leave a Comment