How can I repeat and apply a function at the same DataFrame level using MultiIndex?

Thanks to the answer to my initial question , I now have a multi-indexed DataFrame the way I want it. Now that I have the data in the data structure, I try to figure it out and wonder if there is a better way to do this. My two problems are related, but may have separate “ideal” solutions:

DataFrame Example (Truncated)

Experiment IWWGCW IWWGDW Lead Time 24 48 24 48 2010-11-27 12:00:00 0.997 0.991 0.998 0.990 2010-11-28 12:00:00 0.998 0.987 0.997 0.990 2010-11-29 12:00:00 0.997 0.992 0.997 0.992 2010-11-30 12:00:00 0.997 0.987 0.997 0.987 2010-12-01 12:00:00 0.996 0.986 0.996 0.986 

Iteration

I would like to be able to cycle through this DataFrame, where the iteration will reduce only one index, i.e. iteritems behavior that returns [('IWWGCW', df['IWWGCW']), ('IWWGDW', df['IWWGDW'])] and gives 2 DataFrames with Lead Time columns. My brute force solution is to use a shell procedure that basically executes [(key, df[key] for key in df.columns.levels[0]] . Is there a better way to do this?

To apply

I would also like to do things like “subtract IWWGDW entries from everyone else” to calculate pair differences. I tried to do df.apply(lambda f: f - df['IWWGDW']) , but get KeyError: ('IWWGDW', 'occurred at index 2010-11-26 12:00:00') , regardless whether I use axis=1 or axis=0 . I tried restoring a new DataFrame using the iteration workaround mentioned above, but I always worry about when I iterate over things. Is there a more “pandasic” way to do this?

+4
source share
2 answers

I would suggest using groupby for iteration:

 In [25]: for exp, group in df.groupby(level=0, axis=1): ....: print exp, group ....: IWWGCW Experiment IWWGCW Lead Time 24 48 2010-11-27 12:00:00 0.997 0.991 2010-11-28 12:00:00 0.998 0.987 2010-11-29 12:00:00 0.997 0.992 2010-11-30 12:00:00 0.997 0.987 2010-12-01 12:00:00 0.996 0.986 IWWGDW Experiment IWWGDW Lead Time 24 48 2010-11-27 12:00:00 0.998 0.990 2010-11-28 12:00:00 0.997 0.990 2010-11-29 12:00:00 0.997 0.992 2010-11-30 12:00:00 0.997 0.987 2010-12-01 12:00:00 0.996 0.986 

However, I see that this does not lower the top level as you look. Ideally, you could write something like:

df.groupby(level=0, axis=1).sub(df['IWWGCW'])

and have this binary subtraction, but since df['IWWGCW'] lowers the column names do not line up. This works though:

 In [29]: df.groupby(level=0, axis=1).sub(df['IWWGCW'].values) Out[29]: Experiment IWWGCW IWWGDW Lead Time 24 48 24 48 2010-11-27 12:00:00 0 0 0.001 -0.001 2010-11-28 12:00:00 0 0 -0.001 0.003 2010-11-29 12:00:00 0 0 0.000 0.000 2010-11-30 12:00:00 0 0 0.000 0.000 2010-12-01 12:00:00 0 0 0.000 0.000 

I'll think about it a little more.

+5
source

I know this is old, but after @WesMcKinney's answer, the best hack I found to go into the loop is to simply select it right away:

 for exp, group in df.groupby(level=0, axis=1): print(group[exp]) Lead Time 24 48 2010-11-27 12:00:00 0.997 0.991 2010-11-28 12:00:00 0.998 0.987 2010-11-29 12:00:00 0.997 0.992 2010-11-30 12:00:00 0.997 0.987 2010-12-01 12:00:00 0.996 0.986 

this will return the base level DataFrame correctly

0
source

All Articles