Density-based array filtering

I have an example graph similar to the one below .. which I built with a set of values ​​(x, y) in an array X.

http://bubblebird.com/images/t.png

As you can see, the image has dense peak values ​​from 4000 to 5100

My exact question is: can I programmatically find this range where the graph is the busiest?
those. with array X, how can I find the range within which this graph is dense?
for this array there would be 4000 - 5100.

Assume for simplicity that an array has only one dense region.
Grateful if you can offer a pseudocode / code.

+8
python arrays algorithm filter php
source share
3 answers

You can use the variance of the signal in a moving window. Here is an example (see the graph where the test signal is red, the window dispersion is green, and the filtered signal is blue):

simple example :

test signal generation:

import numpy as np X = np.arange(200) - 100. Y = (np.exp(-(X/10)**2) + np.exp(-((np.abs(X)-50.)/2)**2)/3.) * np.cos(X * 10.) 

calculate the deviation of a moving window:

 window_length = 30 # number of point for the window variance = np.array([np.var(Y[i-window_length / 2.: i+window_length/2.]) for i in range(200)]) 

get indices where the variance is high (here I choose the variance of the criterion that exceeds half the maximum variance ... you can adapt it to your case):

 idx = np.where(variance > 0.5 * np.max(variance)) X_min = np.min(X[idx]) # -14.0 X_max = np.max(X[idx]) # 15.0 

or filter out the signal (set zero dispersion points to zero)

 Y_modified = np.where(variance > 0.5 * np.max(variance), Y, 0) 
+5
source share

you can calculate the absolute difference between adjacent values, and then maybe slightly adjust things using a sliding window, and then find areas where the smoothed absolute difference values ​​are at 50% of the maximum value.

using python (you have python in the tags), it will look like this:

 a = ( 10, 11, 9, 10, 18, 5, 20, 6, 15, 10, 9, 11 ) diffs = [abs(i[0]-i[1]) for i in zip(a,a[1:])] # [1, 2, 1, 8, 13, 15, 14, 9, 5, 1, 2] maximum = max(diffs) # 15 result = [i>maximum/2 for i in diffs] # [False, False, False, True, True, True, True, True, False, False, False] 
+4
source share

You can use a classification algorithm (e.g. k-mean), divide the data into clusters and find the most weighted cluster

0
source share

All Articles