How to filter Pandas dataframe rows by checking the value of a sublayer index in a list?

I have a Pandas dataframe df sample with index multi_level:

 >>> df STK_Name ROIC mg_r STK_ID RPT_Date 002410 20111231 ??? 0.401 0.956 300204 20111231 ??? 0.375 0.881 300295 20111231 ???? 2.370 0.867 300288 20111231 ???? 1.195 0.861 600106 20111231 ???? 1.214 0.857 300113 20111231 ???? 0.837 0.852 

and stk_list defined as stk_list = ['600106','300204','300113']

I want to get df strings, sub_level STK_ID index STK_ID is within stk_list . The output is as follows:

  STK_Name ROIC mg_r STK_ID RPT_Date 300204 20111231 ??? 0.375 0.881 600106 20111231 ???? 1.214 0.857 300113 20111231 ???? 0.837 0.852 

Basically, I can achieve the goal for this sample data:

 df = df.reset_index() ; df[df.STK_ID.isin(stk_list)] 

But I already have the "STK_ID" and "RPT_Date" columns in my application data frame, so reset_index () will result in an error. Anyway, I want to filter the index directly instead of columns.

Learn from this: How to filter by sub-level index in Pandas

I try df[df.index.map(lambda x: x[0].isin(stk_list))] , and Pandas 0.8.1 gives AttributeError: 'unicode' object has no attribute 'isin' ,

My question is: how do I filter the Pandas dataframe rows by checking the sublayer index value in the list without using the reset_index() and set_index() methods?

+7
source share
5 answers

You can try:

 df[df.index.map(lambda x: x[0] in stk_list)] 

Example:

 In : stk_list Out: ['600106', '300204', '300113'] In : df Out: STK_Name ROIC mg_r STK_ID RPT_Date 002410 20111231 ??? 0.401 0.956 300204 20111231 ??? 0.375 0.881 300295 20111231 ???? 2.370 0.867 300288 20111231 ???? 1.195 0.861 600106 20111231 ???? 1.214 0.857 300113 20111231 ???? 0.837 0.852 In : df[df.index.map(lambda x: x[0] in stk_list)] Out: STK_Name ROIC mg_r STK_ID RPT_Date 300204 20111231 ??? 0.375 0.881 600106 20111231 ???? 1.214 0.857 300113 20111231 ???? 0.837 0.852 
+10
source

What about using the level parameter in DataFrame.reindex ?

 In [14]: df Out[14]: 0 1 a 0 0.007288 -0.840392 1 0.652740 0.597250 b 0 -1.197735 0.822150 1 -0.242030 -0.655058 In [15]: stk_list = ['a'] In [16]: df.reindex(stk_list, level=0) Out[16]: 0 1 a 0 0.007288 -0.840392 1 0.652740 0.597250 
+11
source

I'm very late to the party, but by far the most readable and intuitive way to do this is to use index.levels[n].isin ?

It works as follows:

 >>> stk_list = [600106, 300204, 300113] >>> df[df.index.levels[0].isin(stk_list)] STK_Name ROIC mg_r STK_ID RPT_Date 300204 20111231 ??? 0.375 0.881 300295 20111231 ???? 2.370 0.867 300113 20111231 ???? 0.837 0.852 

What I like about this approach is that the team can be read as an English sentence.

ps in OP, stk_list is a list of strings. A little understanding of the -fu list will deal with this:

 df[df.index.levels[0].isin([int(i) for i in stk_list])] 
+7
source

For me, it only worked if I take zero from x as follows:

 a[a.index.map(lambda x: x in b)] 
+1
source

get_level_values :

 df[df.index.get_level_values(level = 0).isin(stk_list)] 
0
source

All Articles