I have a data set that contains data on the nutrition of 3 animals, consisting of animal tag identifiers (1,2,3), type (A, B) and the amount (kg) of feed served for each meal ':
Animal FeedType Amount(kg)
Animal1 A 10
Animal2 B 7
Animal3 A 4
Animal2 A 2
Animal1 B 5
Animal2 B 6
Animal3 A 2
In the R base, I can easily derive the matrix below, which has unique('Animal')as its rows, unique('FeedType')as its columns and cumulative Amount (kg)in the corresponding cells of the matrix, using tapply()as below
out <- with(mydf, tapply(Amount, list(Animal, FeedType), sum))
A B
Animal1 10 5
Animal2 2 13
Animal3 6 NA
Is there equivalent functionality for the Python Pandas framework? What is the most elegant and fastest way to achieve this in Pandas?
PS I want to indicate in which column, in this case Amount, to perform aggregation.
Thanks in advance.
EDIT:
. Pandas 216 347 15 :
start_time1 = timeit.default_timer()
mydf.groupby(['Animal','FeedType'])['Amount'].sum()
elapsed_groupby = timeit.default_timer() - start_time1
start_time2 = timeit.default_timer()
mydf.pivot_table(rows='Animal', cols='FeedType',values='Amount',aggfunc='sum')
elapsed_pivot = timeit.default_timer() - start_time2
print ('elapsed_groupby: ' + str(elapsed_groupby))
print ('elapsed_pivot: ' + str(elapsed_pivot))
:
elapsed_groupby: 10.172213
elapsed_pivot: 8.465783
pivot_table() .