Pandas: why pandas.Series.std () is very different from numpy.std ()

I got two fragment codes as follows.

import numpy numpy.std([766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346]) 0 

and

 import pandas as pd pd.Series([766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346]).std(ddof=0) 10.119288512538814 

This is a huge difference.

May I ask why?

+7
python numpy pandas
source share
1 answer

This issue is indeed already being discussed ( link ); the problem is the standard deviation calculation algorithm used by pandas , since it is not as numerically stable as the one used by numpy .

An easy workaround would be to apply .values to the series, and then apply std to these values; in this case numpy's std :

 pd.Series([766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346]).values.std() 

which gives the expected value of 0.

+2
source share

All Articles