I have a dataset of $n×m$ numbers, where the $m$ variables ($m$ is in the order of 5–10) exhibit various degrees of correlation with one another. The format of the data is not necessarily a timeseries, it could be a discrete or continuous timeseries, and it could be various timeseries grouped together. The dataset is oversampled ($n>10000$).
I want to cull the amount of data (rows) such that the reduced dataset will have the same CDF and PDF as the original data for all $m$ variables. Simple bootstrapping will not do as that technique assumes that the variables are independent. However, I have the impression that multi-stage bootstrapping might actually be something that could be used here.
Does anyone have an answer to my question, or can anyone refer me to some literature where I could find an answer on this? If you happen to have some coding examples yourself, I am writing in Matlab.
Best Answer
The wikipedia reference is excellent. It has a number of references to several books. My book Chernick (2007) covers time series. But the most thorough text on dependent data is Lahiri's text. I will provide these additional references.
The two references are
1) Bootstrap Methods: A Practitioners Guide 2nd Edition, Michael R. Chernick (2007) Wiley.
www.wiley.com/WileyCDA/WileyTitle/productCd-0471756210.html
2) Resampling Methods for Dependent Data, S. N. Lahiri (2003) Springer.
Similar Posts:
- Solved – How to determine block size for a block bootstrap and it’s variants
- Solved – How to do bootstrapping with time series data
- Solved – Standard error of parameter estimates in regularized regression
- Solved – Resampling large dataset
- Solved – How to model binary dependent data with temporal autocorrelation