Standard deviation for DF, pandas

for example, I have a pandas DataFrame that looks like:

abc 1 2 3 4 5 6 7 8 9 

I want to calculate the standard deviation for all values ​​in this DF. The df.std() function returns me the value of the pro column.

Of course, I can create the following code:

 sd = [] sd.append(list(df['a'])) sd.append(list(df['b'])) sd.append(list(df['c'])) numpy.std(sd) 

Is it possible to make this code simpler and use some pandas function for this DF?

+5
source share
2 answers

df.values returns a NumPy array containing the values ​​in df . Then you can apply np.std to this array:

 In [52]: np.std(sd) Out[52]: 2.5819888974716112 In [53]: np.std(df.values) Out[53]: 2.5819888974716112 
+5
source

Alternatively, if you like the idea of ​​"making a vector of all your values" and then accept its standard deviation:

 df.stack().std() 

But notice here: remember that pandas std functions take a different denominator (degrees of freedom) than numpy std functions , so:

 df = pd.DataFrame(np.arange(1, 10).reshape(3, 3), columns=list('abc')) print np.std(df.values) print df.stack().std() print df.stack().std() * np.sqrt(8. / 9.) 

gives:

 2.58198889747 2.73861278753 2.58198889747 

The average is different! Not a typo!

+2
source

All Articles