Pandas describe 0.18.0 vs pandas describe 0.17.0

In one environment, I have pandas version 0.17.0 with numpy version 1.10.1. In another environment, I have pandas version 0.18.1 with numpy version 1.10.4.

I am running this piece of code

from pandas import Series import numpy as np Series([1,2,3,4,5,np.NaN]).describe() 

With pandas version 0.17.0, I get this output:

 count 5.000000 mean 3.000000 std 1.581139 min 1.000000 25% 2.000000 50% 3.000000 75% 4.000000 max 5.000000 dtype: float64 

with pandas version 0.18.1 I get this output:

 count 5.000000 mean 3.000000 std 1.581139 min 1.000000 25% NaN 50% NaN 75% NaN max 5.000000 dtype: float64 

what gives?

+5
source share
1 answer

Your problem is that Series.describe() uses Series.quantile() and there is currently a reported error (# 13098) in Pandas 0.18.1 where Series.quantile() will not return percentiles when the series contains nan .

Demonstration error C # 13098:

 >>> import pandas as pd >>> import numpy >>> s = pd.Series([1, 2, 3, 4, numpy.nan]) >>> s.quantile(0.5) nan 

If you look at pull # 12752 , then it looks like notnull , used to remove nan values ​​before calculating percentiles, however it has been removed.


Update

Now this problem is closed with this commit , after which Series.quantile() processes nan again (2016/05/12).

+4
source

All Articles