Thanks to the answer to my initial question , I now have a multi-indexed DataFrame the way I want it. Now that I have the data in the data structure, I try to figure it out and wonder if there is a better way to do this. My two problems are related, but may have separate “ideal” solutions:
DataFrame Example (Truncated)
Experiment IWWGCW IWWGDW Lead Time 24 48 24 48 2010-11-27 12:00:00 0.997 0.991 0.998 0.990 2010-11-28 12:00:00 0.998 0.987 0.997 0.990 2010-11-29 12:00:00 0.997 0.992 0.997 0.992 2010-11-30 12:00:00 0.997 0.987 0.997 0.987 2010-12-01 12:00:00 0.996 0.986 0.996 0.986
Iteration
I would like to be able to cycle through this DataFrame, where the iteration will reduce only one index, i.e. iteritems behavior that returns [('IWWGCW', df['IWWGCW']), ('IWWGDW', df['IWWGDW'])] and gives 2 DataFrames with Lead Time columns. My brute force solution is to use a shell procedure that basically executes [(key, df[key] for key in df.columns.levels[0]] . Is there a better way to do this?
To apply
I would also like to do things like “subtract IWWGDW entries from everyone else” to calculate pair differences. I tried to do df.apply(lambda f: f - df['IWWGDW']) , but get KeyError: ('IWWGDW', 'occurred at index 2010-11-26 12:00:00') , regardless whether I use axis=1 or axis=0 . I tried restoring a new DataFrame using the iteration workaround mentioned above, but I always worry about when I iterate over things. Is there a more “pandasic” way to do this?
source share