Pandas: why pandas.Series.std () is very different from numpy.std ()

Question

Pandas: why pandas.Series.std () is very different from numpy.std ()

I got two fragment codes as follows.

import numpy numpy.std([766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346]) 0

and

 import pandas as pd pd.Series([766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346]).std(ddof=0) 10.119288512538814

This is a huge difference.

May I ask why?

+7

python numpy pandas

Tony Jul 2 '15 at 6:02

source share

1 answer

Cleb · Answer 1 · 2015-07-02T09:34:39+0000

This issue is indeed already being discussed ( link ); the problem is the standard deviation calculation algorithm used by pandas , since it is not as numerically stable as the one used by numpy .

An easy workaround would be to apply .values to the series, and then apply std to these values; in this case numpy's std :

 pd.Series([766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346]).values.std()

which gives the expected value of 0.

Pandas: why pandas.Series.std () is very different from numpy.std ()

More articles: