How to calculate the difference in n columns in pandas, not in rows

Question

How to calculate the difference in n columns in pandas, not in rows

I play with data and should look at the differences in the columns (as well as in the rows) on a fairly large framework. The easiest way for rows is obviously the diff () method, but I can't find an equivalent for columns?

My current solution is to get a data frame with columns different for

df.transpose().diff().transpose()

Is there a more effective alternative? Or is it such an odd use of pandas that it was simply never requested / thought to be useful? :)

Thank,

+4

python numpy pandas

John smizz Mar 23 '15 at 19:07

source share

3 answers

, .

df['new_col'] = df['a'] - df['b']

, unutbu ( np.ndarray , ).

# Create a large dataframe.
df = pd.DataFrame(np.random.randn(1e6, 100))

%%timeit
np.diff(df.values, axis=1)

1 loops, best of 3: 450 ms per loop

%%timeit
df - df.shift(axis=1)

1 loops, best of 3: 727 ms per loop


%%timeit
df.T.diff().T

1 loops, best of 3: 1.52 s per loop

+1

Alexander 23 . '15 19:46

Use parameter axisin diff:

df = pd.DataFrame(np.arange(12).reshape(3, 4), columns=list('ABCD'))
#    A  B   C   D
# 0  0  1   2   3
# 1  4  5   6   7
# 2  8  9  10  11

df.diff(axis=1)            # subtracting column wise
#    A    B   C   D
# 0  NaN  1   1   1
# 1  NaN  1   1   1
# 2  NaN  1   1   1

df.diff()                  # subtracting row wise
#    A    B     C     D
# 0  NaN  NaN   NaN   NaN
# 1  4    4     4     4
# 2  4    4     4     4

+1

Adrian martin Jun 24 '15 at 13:55

source share

unutbu · Accepted Answer · 2015-03-23T19:39:44+0000

Pandas DataFrames are great for processing tabular data whose columns have different data types.

, , . , NumPy Pandas DataFrame.

arr = df.values NumPy DataFrame. , NumPy dtype. ( , df.values object dtype).

, np.diff(arr, axis=...):

import numpy as np
import pandas as pd

df = pd.DataFrame(np.arange(12).reshape(3,4), columns=list('ABCD'))
#    A  B   C   D
# 0  0  1   2   3
# 1  4  5   6   7
# 2  8  9  10  11

np.diff(df.values, axis=0)    # difference of the rows
# array([[4, 4, 4, 4],
#        [4, 4, 4, 4]])

np.diff(df.values, axis=1)    # difference of the columns
# array([[1, 1, 1],
#        [1, 1, 1],
#        [1, 1, 1]])

How to calculate the difference in n columns in pandas, not in rows

More articles: