NumPy Percile function different from MATLAB percentile function

When I try to calculate the 75th percentile in MATLAB, I get a different value than in NumPy.

MATLAB:

>> x = [ 11.308 ; 7.2896; 7.548 ; 11.325 ; 5.7822; 9.6343; 7.7117; 7.3341; 10.398 ; 6.9675; 10.607 ; 13.125 ; 7.819 ; 8.649 ; 8.3106; 12.129 ; 12.406 ; 10.935 ; 12.544 ; 8.177 ] >> prctile(x, 75) ans = 11.3165 

Python + NumPy:

 >>> import numpy as np >>> x = np.array([ 11.308 , 7.2896, 7.548 , 11.325 , 5.7822, 9.6343, 7.7117, 7.3341, 10.398 , 6.9675, 10.607 , 13.125 , 7.819 , 8.649 , 8.3106, 12.129 , 12.406 , 10.935 , 12.544 , 8.177 ]) >>> np.percentile(x, 75) 11.312249999999999 

I also checked the answer with R, and I get a NumPy response.

R:

 > x <- c(11.308 , 7.2896, 7.548 , 11.325 , 5.7822, 9.6343, + 7.7117, 7.3341, 10.398 , 6.9675, 10.607 , 13.125 , + 7.819 , 8.649 , 8.3106, 12.129 , 12.406 , 10.935 , + 12.544 , 8.177) > quantile(x, 0.75) 75% 11.31225 

What's going on here? And is there a way to make Python and R behavior mirror MATLAB?

+8
python numpy r matlab percentile
source share
2 answers

MATLAB uses mid-interpolation by default. NumPy and R use linear interpolation by default:

 In [182]: np.percentile(x, 75, interpolation='linear') Out[182]: 11.312249999999999 In [183]: np.percentile(x, 75, interpolation='midpoint') Out[183]: 11.3165 

Understand the difference between linear and midpoint , consider this simple example:

 In [187]: np.percentile([0, 100], 75, interpolation='linear') Out[187]: 75.0 In [188]: np.percentile([0, 100], 75, interpolation='midpoint') Out[188]: 50.0 

To compile the latest version of NumPy (using Ubuntu):

 mkdir $HOME/src git clone https://github.com/numpy/numpy.git git remote add upstream https://github.com/numpy/numpy.git # Read ~/src/numpy/INSTALL.txt sudo apt-get install libatlas-base-dev libatlas3gf-base python setup.py build --fcompiler=gnu95 python setup.py install 

The advantage of using git instead of pip is that it is easy to upgrade (or downgrade) to other versions of NumPy (and you will also get the source code):

 git fetch upstream git checkout master # or checkout any other version of NumPy cd ~/src/numpy /bin/rm -rf build cdsitepackages # assuming you are using virtualenv; otherwise cd to your local python sitepackages directory /bin/rm -rf numpy numpy-*-py2.7.egg-info cd ~/src/numpy python setup.py build --fcompiler=gnu95 python setup.py install 
+8
source share

Since the accepted answer is still incomplete even after the comment by @cpaulik, I am posting here that I hope a more complete answer (although, for brevity, not perfect, see below).

Using np.percentile (x, p, interpolation = 'midpoint') will only give the same answer for very specific values, namely when p / 100 is a multiple of 1 / n, n is the number of elements in the array. In the original question, this did occur, since n = 20 and p = 75, but in general the two functions are different.

A short emulation of the Matlab prctile function is defined as follows:

 def quantile(x,q): n = len(x) y = np.sort(x) return(np.interp(q, np.linspace(1/(2*n), (2*n-1)/(2*n), n), y)) def prctile(x,p): return(quantile(x,np.array(p)/100)) 

This function, like Matlab, provides a piecewise linear output spanning from min (x) to max (x). The numpy percentile function, with interpolation = midpoint, returns a piecewise constant function between the middle of the two smallest elements and the middle of the two largest. Building two functions for an array in the original question gives the image in this link (sorry, cannot insert it). The dashed red line indicates 75% percentile, where the two functions actually coincide.

PS The reason this function is not really equivalent to Matlab is because it only accepts a one-dimensional x, giving an error for things with a larger size. Matlab, on the other hand, accepts a higher dim x and works on the first (non-trivial) dimension, but its correct execution is likely to take a little longer. However, both this function and the Matlab function should work correctly with higher-dimensional inputs for p / q (thanks to the use of np.interp, which will take care of it).

+2
source share

All Articles