Data spikes come in many shapes and sizes. And many different modeling strategies have been proposed for cleaning them. But sometimes, a simple magnitude cutoff will suffice.

The following function is designed for cleaning spikes from financial data. It assumes the spikes are non-adjacent – if your spikes are anything more than a single blip, it won’t work. But if the spikes are reasonably rare, and don’t merge into each other, then it could be just the thing.

How to use the spike scrubber function

  • The parameter x is a data frame, matrix,  time series object (eg a zoo object) or similar.
  • The maxSpikeRatio parameter determines the size of the spike relative to the adjacent data points
  • The incline of both sides of the spike must exceed maxSpikeRatio

 

Lets clean some spikes!

Here we’ll test out the spike scrubber on some sample data…

 

And here’s the result…

Simple spike cleaner

A simple spike scrubber to find and replace outlier data points.

 

Both spikes had incline greater than 100, the default for maxSpikeRatio.  So it worked pretty well. Play around with this value if you find some spikes aren’t getting cleaned.

 

Not to be used on data that’s naturally spikey!