Pandas 0.18.1 groupby and resample with multi-level aggregation error

I just updated pandas from 0.17.1 to 0.18.1 and I think I found a problem with the new resampling methodology described below when changing any existing code. According to this documentation, df3_resample and df4_resample in my example below should return the same data file, however df4_resample throws an exception. It helped me a bit, so I decided that I would share it.

Exception: Column(s) A already selected 

http://pandas.pydata.org/pandas-docs/version/0.18.0/whatsnew.html#whatsnew-0180-breaking-resample

http://pandas.pydata.org/pandas-docs/version/0.18.1/whatsnew.html#groupby-syntax-with-window-and-resample-operations

 df = pd.DataFrame(np.random.rand(10,4), columns=list('ABCD'), index=pd.date_range('2010-01-01 09:00:00', periods=10, freq='s')) df['item'] = 'item_a' # add column for groupby # THIS WORKS df1_resample = df.groupby('item').resample('2s').agg({'A': np.mean, 'B': np.max}).reset_index() print df1_resample # THIS WORKS df2_resample = df.resample('2s').agg({'A': {'A_mean': np.mean, 'A_max': np.max}}).reset_index() print df2_resample # THIS WORKS df3_resample = df.groupby('item').apply(lambda x: x.resample('2s').agg({'A': {'A_mean': np.mean, 'A_max': np.max}})).reset_index() print df3_resample # THIS DOESN"T WORKS df4_resample = df.groupby('item').resample('2s').agg({'A': {'A_mean': np.mean, 'A_max': np.max}}) print df4_resample 

Output:

  item level_1 AB 0 item_a 2010-01-01 09:00:00 0.611660 0.739640 1 item_a 2010-01-01 09:00:02 0.615876 0.880113 2 item_a 2010-01-01 09:00:04 0.218292 0.441504 3 item_a 2010-01-01 09:00:06 0.753698 0.637787 4 item_a 2010-01-01 09:00:08 0.471272 0.474738 index A A_mean A_max 0 2010-01-01 09:00:00 0.611660 0.813038 1 2010-01-01 09:00:02 0.615876 0.994657 2 2010-01-01 09:00:04 0.218292 0.233478 3 2010-01-01 09:00:06 0.753698 0.848107 4 2010-01-01 09:00:08 0.471272 0.610592 item level_1 A A_mean A_max 0 item_a 2010-01-01 09:00:00 0.611660 0.813038 1 item_a 2010-01-01 09:00:02 0.615876 0.994657 2 item_a 2010-01-01 09:00:04 0.218292 0.233478 3 item_a 2010-01-01 09:00:06 0.753698 0.848107 4 item_a 2010-01-01 09:00:08 0.471272 0.610592 File "<some_file.py>", line 29, in <module> df4_resample = df.groupby('item').resample('2s').agg({'A': {'A_mean': np.mean, 'A_max': np.max}}) File "C:\Anaconda2\lib\site-packages\pandas\tseries\resample.py", line 293, in aggregate result, how = self._aggregate(arg, *args, **kwargs) File "C:\Anaconda2\lib\site-packages\pandas\core\base.py", line 505, in _aggregate result = list(_agg(arg, _agg_1dim).values()) File "C:\Anaconda2\lib\site-packages\pandas\core\base.py", line 496, in _agg result[fname] = func(fname, agg_how) File "C:\Anaconda2\lib\site-packages\pandas\core\base.py", line 479, in _agg_1dim return colg.aggregate(how, _level=(_level or 0) + 1) File "C:\Anaconda2\lib\site-packages\pandas\tseries\resample.py", line 293, in aggregate result, how = self._aggregate(arg, *args, **kwargs) File "C:\Anaconda2\lib\site-packages\pandas\core\base.py", line 528, in _aggregate result = _agg(arg, lambda fname, File "C:\Anaconda2\lib\site-packages\pandas\core\base.py", line 496, in _agg result[fname] = func(fname, agg_how) File "C:\Anaconda2\lib\site-packages\pandas\core\base.py", line 529, in <lambda> agg_how: _agg_1dim(self._selection, agg_how)) File "C:\Anaconda2\lib\site-packages\pandas\core\base.py", line 475, in _agg_1dim colg = self._gotitem(name, ndim=1, subset=subset) File "C:\Anaconda2\lib\site-packages\pandas\core\base.py", line 680, in _gotitem groupby=self._groupby[key], File "C:\Anaconda2\lib\site-packages\pandas\core\base.py", line 326, in __getitem__ raise Exception('Column(s) %s already selected' % self._selection) Exception: Column(s) A already selected 
+6
source share
1 answer

I'm not sure why resample does not work for this, but there is a handy job that does not require the use of lambda. Try:

 df.groupby([ 'item', pd.Grouper(freq = '2s') ]).agg({ 'A' : ['mean', 'max'] }).rename(columns = { 'mean' : 'A_mean', 'max' : 'A_max' }, level = 1).reset_index() 

output

Instead of .resample('2S') you can add pd.Grouper('2s') to groupby() . It works the same for you. Here is the documentation -> http://pandas.pydata.org/pandas-docs/version/0.18/generated/pandas.Grouper.html

In another note, you should avoid renaming columns with a nested dictionary (it is deprecated) and use the actual .rename() function .rename() .

0
source

All Articles