Pandas: groupby and date conversion

I am still new to pandas and stumbled upon very strange behavior when I use the batch conversion operation on multiple columns, including the dtype column datetime64 [ns].

My (toy) example:

import pandas as pd df = pd.DataFrame({'date': [pd.datetime(2014,3,17), pd.datetime(2014,3,24), pd.datetime(2014,3,17)], 'hdg_id': [4041,4041,4041],'stock': [1.0,1.0,1.0]}) In[117]: df Out[117]: date hdg_id stock 0 2014-03-17 4041 1 1 2014-03-24 4041 1 2 2014-03-17 4041 1 

Now I am grouping the date and hdg_id (for hdg_id this is trivial since there is only one unique value, but I need multiple grouping to get the result, my actual applications are, of course, more complicated):

 In[118]: df.groupby(['date', 'hdg_id']).transform(sum) Out[118]: stock 0 0.000000e+00 1 4.940656e-324 2 0.000000e+00 

This is not my expected result. If I convert the column date to a string, I get what I expect:

 In[129]: df['date']=df['date'].astype(str) In[131]: df.groupby(['date', 'hdg_id']).transform(sum) Out[131]: stock 0 2 1 1 2 2 

Can someone share some insides on what's going on?

Thanks a lot!

+4
source share
1 answer

Is there a reason to use .transform (sum)?

You can do something like this: df.groupby (['date', 'hdg_id']). sum () enter image description here

0
source

All Articles