Pandas Group fancy indexing: how to return (index) all DataFrames in a Panel based on Boolean from several columns in each df

I have a Pandas panel with many DataFrames with the same row / column labels. I want to create a new panel with DataFrames that satisfy certain criteria based on multiple columns.

This is easy with data frames and strings: Let's say I have df, zHe_compare. I can get suitable strings with:

zHe_compare[(zHe_compare['zHe_calc'] > 100) & (zHe_compare['zHe_med'] > 100) | ((zHe_obs_lo_2s <=zHe_compare['zHe_calc']) & (zHe_compare['zHe_calc'] <= zHe_obs_hi_2s))] 

but how to do it (pseudo-code, simplified logical):

 good_results_panel = results_panel[ all_dataframes[ sum ('zHe_calc' < 'zHe_obs') > min_num ] ] 

I know the internal logical part, but how to specify this for each data block in the panel? Since I need several columns from each df, I have not succeeded using the panel.minor_xs slicing methods.

thanks!

+7
source share
1 answer

As already mentioned in the documentation , Panel is currently a little underdeveloped, so the sweet syntax you came up with relies on working with the DataFrame yet.

Meanwhile, I would suggest using the Panel.select method:

 def is_good_result(item_label): # whatever condition over the selected item df = results_panel[item_label] return df['col1'].sum() > 5 good_results = results.select(is_good_result) 

The is_good_result function returns a boolean value. Note that its argument is not an instance of the DataFrame , because Panel.select applies its argument to the label of the element, and not to the contents of the DataFrame this element.

Of course, you can fill the entire criterion function in lambda in one statement if you go in all brevity:

 good_results = results.select( lambda item_label: results[item_label]['col1'].sum() > 5 ) 
+1
source

All Articles