Standard deviation for DF, pandas

Question

Standard deviation for DF, pandas

for example, I have a pandas DataFrame that looks like:

abc 1 2 3 4 5 6 7 8 9

I want to calculate the standard deviation for all values in this DF. The df.std() function returns me the value of the pro column.

Of course, I can create the following code:

 sd = [] sd.append(list(df['a'])) sd.append(list(df['b'])) sd.append(list(df['c'])) numpy.std(sd)

Is it possible to make this code simpler and use some pandas function for this DF?

+5

python pandas dataframe

Guforu Apr 22 '15 at 13:26

source share

2 answers

Alternatively, if you like the idea of "making a vector of all your values" and then accept its standard deviation:

 df.stack().std()

But notice here: remember that pandas std functions take a different denominator (degrees of freedom) than numpy std functions , so:

 df = pd.DataFrame(np.arange(1, 10).reshape(3, 3), columns=list('abc')) print np.std(df.values) print df.stack().std() print df.stack().std() * np.sqrt(8. / 9.)

gives:

 2.58198889747 2.73861278753 2.58198889747

The average is different! Not a typo!

+2

8one6 Apr 22 '15 at 2:31

source share

unutbu · Accepted Answer · 2015-04-22T13:29:45+0000

df.values returns a NumPy array containing the values in df . Then you can apply np.std to this array:

 In [52]: np.std(sd) Out[52]: 2.5819888974716112 In [53]: np.std(df.values) Out[53]: 2.5819888974716112

Standard deviation for DF, pandas

More articles: