Solved – High frequency data series cleaning in R

I am looking at timeseries data in foreign exchange and bond markets (to test for reversion on extreme moves). Unfortunate "tick" data, namely high frequency data, is prone to many problems, and they obviously can significantly mess with the analysis. I'd like to know which R library can help with the following type of fairly frequent data cleaning problems:

1) one spike:

enter image description here

This is typically created when one market maker prints a wrong quote in one tick, but there would have been no tradability at that price because it lasted for a split second. I'd like to eliminate the spike (but only if there is only one (or maybe 2) prints)

2) bid ask gapping:

enter image description here

In this case the market is fairly illiquid and the data algorithm is jumping between bids and asks (in this case 2bps wide) causing this weird cloud.

Where should I start to clean this stuff, obviously trying to throw out the least amount of real data. I realise that the maxim of "look at the data" applies here, but when you're looking at 1000 series each with 100 days of data, you can see how this will become quickly impractical so I need some automated help. I'll also look at Python language methods if they're available or better.

There's a package for that. Check out RTAQ.

Small plug: there's a quantitative finance stack exchange you may be interested in.

Similar Posts:

Rate this post

Leave a Comment