Equivalent to R tapply () in Python Pandas

Question

Equivalent to R tapply () in Python Pandas

I have a data set that contains data on the nutrition of 3 animals, consisting of animal tag identifiers (1,2,3), type (A, B) and the amount (kg) of feed served for each meal ':

Animal   FeedType   Amount(kg)
Animal1     A         10
Animal2     B         7
Animal3     A         4
Animal2     A         2
Animal1     B         5
Animal2     B         6
Animal3     A         2

In the R base, I can easily derive the matrix below, which has unique('Animal')as its rows, unique('FeedType')as its columns and cumulative Amount (kg)in the corresponding cells of the matrix, using tapply()as below

out <- with(mydf, tapply(Amount, list(Animal, FeedType), sum))

         A  B
Animal1 10  5
Animal2  2 13
Animal3  6 NA

Is there equivalent functionality for the Python Pandas framework? What is the most elegant and fastest way to achieve this in Pandas?

PS I want to indicate in which column, in this case Amount, to perform aggregation.

Thanks in advance.

EDIT:

. Pandas 216 347 15 :

start_time1 = timeit.default_timer()
mydf.groupby(['Animal','FeedType'])['Amount'].sum()
elapsed_groupby = timeit.default_timer() - start_time1

start_time2 = timeit.default_timer()
mydf.pivot_table(rows='Animal', cols='FeedType',values='Amount',aggfunc='sum')
elapsed_pivot = timeit.default_timer() - start_time2

print ('elapsed_groupby: ' + str(elapsed_groupby))
print ('elapsed_pivot: ' + str(elapsed_pivot))

:

elapsed_groupby: 10.172213
elapsed_pivot: 8.465783

pivot_table() .

+4

python pandas r tapply

Rhubarb 03 . '14 14:21

2

:

In [7]: df = pd.read_clipboard(sep="\s+", index_col=False)

In [8]: df
Out[8]:
    Animal FeedType  Amount(kg)
0  Animal1        A          10
1  Animal2        B           7
2  Animal3        A           4
3  Animal2        A           2
4  Animal1        B           5
5  Animal2        B           6
6  Animal3        A           2

:

In [9]: df.groupby(['Animal','FeedType']).sum()
Out[9]:
                  Amount(kg)
Animal  FeedType
Animal1 A                 10
        B                  5
Animal2 A                  2
        B                 13
Animal3 A                  6

, unstack dataframe:

In [10]: df.groupby(['Animal','FeedType']).sum().unstack()
Out[10]:
          Amount(kg)
FeedType           A   B
Animal
Animal1           10   5
Animal2            2  13
Animal3            6 NaN

+7

Zelazny7 03 . '14 14:35

joris · Accepted Answer · 2014-01-03T14:52:08+0000

@Zelazny7 groupby unstack, , , pivot_table (. doc) [ 0.13 ]:

In [13]: df.pivot_table(rows='Animal', cols='FeedType', values='Amount(kg)', aggfunc='sum')
Out[13]:
FeedType   A   B
Animal
Animal1   10   5
Animal2    2  13
Animal3    6 NaN

Pandas ( 0.14 ) pivot_table:

In [13]: df.pivot_table(index='Animal', columns='FeedType', values='Amount(kg)', aggfunc='sum')
Out[13]:
FeedType   A   B
Animal
Animal1   10   5
Animal2    2  13
Animal3    6 NaN

Equivalent to R tapply () in Python Pandas

More articles: