Efficient movable, reliable scale estimation for python array

I am looking for a fast and efficient way to calculate a reliable, rolling scale estimate for a dataset. I work with 1d arrays of typically 3-400k elements. Until recently, I worked with simulated data (without any catastrophic outliers), and the move_std function from the excellent Bottleneck package helped me a lot. However, since I switched to noisy data, std is not behaving well enough to be useful.

In the past, I used a very simple two-tier mid-size code, step-by-step, to deal with the problem of poorly managed distributions:

def bwmv(data_array): cent = np.median(data_array) MAD = np.median(np.abs(data_array-cent)) u = (data_array-cent) / 9. / MAD uu = u*u I = np.asarray((uu <= 1.), dtype=int) return np.sqrt(len(data_array) * np.sum((data_array-cent)**2 * (1.-uu)**4 * I)\ /(np.sum((1.-uu) * (1.-5*uu) * I)**2)) 

however, the arrays I'm working with now are large enough to be too slow. Does anyone know of a package that provides such an assessment, or has any recommendation for a quick and effective approach to this?

+4
source share
1 answer

I used a simple low-pass filter in similar situations.

Conceptually, you can get a displacement estimate for the mean with fac = 0.99; filtered[k] = fac*filtered[k-1] + (1-fac)*data[k] fac = 0.99; filtered[k] = fac*filtered[k-1] + (1-fac)*data[k] , which is extremely efficient for implementation (in C). A slightly more bizarre IIR filter than this one, low-pass butterworth, is easy to configure in scipy:

 b, a = scipy.signal.butter(2, 0.1) filtered = scipy.signal.lfilter(b, a, data) 

To get a rating for the “scale,” you can subtract this “average rating” from the data. This effectively turns the low-pass filter into a high-pass filter. Take abs () and pass it through another low pass filter.

The result may look like this:

script output

Full script:

 from pylab import * from scipy.signal import lfilter, butter data = randn(1000) data[300:] += 1.0 data[600:] *= 3.0 b, a = butter(2, 0.03) mean_estimate = lfilter(b, a, data) scale_estimate = lfilter(b, a, abs(data-mean_estimate)) plot(data, '.') plot(mean_estimate) plot(mean_estimate + scale_estimate, color='k') plot(mean_estimate - scale_estimate, color='k') show() 

Obviously, the butter () options should be tuned to your problem. If you set order 1 instead of 2, you will get exactly that simple filter that I described first.

Disclaimer: This is an engineer taking on this issue. This approach probably does not sound in a statistical or mathematical way. Also, I'm not sure if it really solves your problem (please explain better if this is not the case), but don’t worry, I had some fun with it, -)

+3
source

All Articles