I am looking at timeseries data in foreign exchange and bond markets (to test for reversion on extreme moves). Unfortunate "tick" data, namely high frequency data, is prone to many problems, and they obviously can significantly mess with the analysis. I'd like to know which R library can help with the following type of fairly frequent data cleaning problems:
1) one spike:
This is typically created when one market maker prints a wrong quote in one tick, but there would have been no tradability at that price because it lasted for a split second. I'd like to eliminate the spike (but only if there is only one (or maybe 2) prints)
2) bid ask gapping:
In this case the market is fairly illiquid and the data algorithm is jumping between bids and asks (in this case 2bps wide) causing this weird cloud.
Where should I start to clean this stuff, obviously trying to throw out the least amount of real data. I realise that the maxim of "look at the data" applies here, but when you're looking at 1000 series each with 100 days of data, you can see how this will become quickly impractical so I need some automated help. I'll also look at Python language methods if they're available or better.
Best Answer
There's a package for that. Check out RTAQ.
Small plug: there's a quantitative finance stack exchange you may be interested in.