Solved – Resampling correlated data using bootstrap

I have a dataset of $n×m$ numbers, where the $m$ variables ($m$ is in the order of 5–10) exhibit various degrees of correlation with one another. The format of the data is not necessarily a timeseries, it could be a discrete or continuous timeseries, and it could be various timeseries grouped together. The dataset is oversampled ($n>10000$).

I want to cull the amount of data (rows) such that the reduced dataset will have the same CDF and PDF as the original data for all $m$ variables. Simple bootstrapping will not do as that technique assumes that the variables are independent. However, I have the impression that multi-stage bootstrapping might actually be something that could be used here.

Does anyone have an answer to my question, or can anyone refer me to some literature where I could find an answer on this? If you happen to have some coding examples yourself, I am writing in Matlab.

The wikipedia reference is excellent. It has a number of references to several books. My book Chernick (2007) covers time series. But the most thorough text on dependent data is Lahiri's text. I will provide these additional references.

The two references are

1) Bootstrap Methods: A Practitioners Guide 2nd Edition, Michael R. Chernick (2007) Wiley.

www.wiley.com/WileyCDA/WileyTitle/productCd-0471756210.html

2) Resampling Methods for Dependent Data, S. N. Lahiri (2003) Springer.

www.springer.com/us/book/9780387009285

Similar Posts:

Rate this post

Leave a Comment