How to get index of specific percentile in numpy / scipy?

I looked at this answer which explains how to calculate the value of a specific percentile, and this answer that explains how to calculate the percentiles corresponding to each element.

  • Using the first solution, I can calculate the value and scan the original array to find the index.

  • Using the second solution, I can scan the entire output array for the percentile I'm looking for.

However, both require additional scanning if I want to know the index (in the original array) that corresponds to a specific percentile (or the index containing the nearest element).

Is there a more direct or built-in way to get the index corresponding to the percentile?

Note. My array is not sorted, and I need an index in the original, unsorted array .

+7
python numpy scipy
source share
4 answers

This is a bit confusing, but you can get what you need with np.argpartition . Take a simple array and shuffle it:

 >>> a = np.arange(10) >>> np.random.shuffle(a) >>> a array([5, 6, 4, 9, 2, 1, 3, 0, 7, 8]) 

If you want to find, for example, a quantile index of 0.25, this will correspond to the element at the idx position of the sorted array:

 >>> idx = 0.25 * (len(a) - 1) >>> idx 2.25 

You need to figure out how to round this to int, let's say you go with the nearest integer:

 >>> idx = int(idx + 0.5) >>> idx 2 

If you call np.argpartition now, this is what you get:

 >>> np.argpartition(a, idx) array([7, 5, 4, 3, 2, 1, 6, 0, 8, 9], dtype=int64) >>> np.argpartition(a, idx)[idx] 4 >>> a[np.argpartition(a, idx)[idx]] 2 

It is easy to verify that these last two expressions are, respectively, the index and value of the quantum .25.

+5
source share

If numpy is used, you can also use the built-in percentile function. Starting from version 1.9.0 of numpy, the percentile has the option "interpolation", which allows you to select the value of the lower / highest / nearest percentile. The following will work with unsorted arrays and find the nearest percentile index:

 import numpy as np p=70 # my desired percentile, here 70% x=np.random.uniform(10,size=(1000))-5.0 # dummy vector # index of array entry nearest to percentile value i_near=abs(x-np.percentile(x,p,interpolation='nearest')).argmin() 

Most people usually want to get the closest percentile as indicated above. But just for completeness, you can also easily specify to get a record lower or higher than the specified percentile value:

 # index of array entry greater than percentile value: i_high=abs(x-np.percentile(x,p,interpolation='higher')).argmin() # index of array entry smaller than percentile value: i_low=abs(x-np.percentile(x,p,interpolation='lower')).argmin() 

For OLD versions numpy <v1.9.0, the interpolation option is not available, and thus the equivalent is as follows:

 # Calculate 70th percentile: pcen=np.percentile(x,p) i_high=np.asarray([i-pcen if i-pcen>=0 else x.max()-pcen for i in x]).argmin() i_low=np.asarray([i-pcen if i-pcen<=0 else x.min()-pcen for i in x]).argmax() i_near=abs(x-pcen).argmin() 

In short:

i_high indicates an array entry, which is the next value equal to or greater than the requested percentile.

i_low indicates an array entry, which is the next value equal to or less than the requested percentile.

i_near points to an array entry that is closest to the percentile, and may be larger or smaller.

My results:

 pcen 

2,3436832738049946

 x[i_high] 

2,3523077864975441

 x[i_low] 

2.339987054079617

 x[i_near] 

2.339987054079617

 i_high,i_low,i_near 

(876, 368, 368)

i.e. location 876 is the closest value in excess of pcen, but location 368 is even closer, but slightly less than the percentile value.

+2
source share

You can select the values ​​in df in the specified quantile using df.quantile ().

 df_metric_95th_percentile = df.metric[df >= df['metric'].quantile(q=0.95)] 
+1
source share

Assuming the array is sorted ... If I don't understand you, you can calculate the percentile index by taking the length of the array -1, multiplying it by a quantile and rounding to the nearest integer.

 round( (len(array) - 1) * (percentile / 100.) ) 

should provide you with the nearest index to the percentile

0
source share

All Articles