Using slicers on a multi-index

I have a dataframe of the form:

Contract Date 201501 2014-04-29 1416.0 2014-04-30 1431.1 2014-05-01 1430.6 2014-05-02 1443.9 2014-05-05 1451.6 2014-05-06 1461.4 2014-05-07 1456.0 2014-05-08 1441.1 2014-05-09 1437.8 2014-05-12 1445.2 2014-05-13 1458.2 2014-05-14 1487.6 2014-05-15 1477.6 2014-05-16 1467.9 2014-05-19 1484.9 2014-05-20 1470.5 2014-05-21 1476.9 2014-05-22 1490.0 2014-05-23 1473.3 2014-05-27 1462.5 2014-05-28 1456.3 2014-05-29 1460.5 201507 2014-05-30 1463.5 2014-06-02 1447.5 2014-06-03 1444.4 2014-06-04 1444.7 2014-06-05 1455.9 2014-06-06 1464.0 

If the contract and date are indices of type int and datetime64 respectively.

I want to select a date range. It works by doing:

 df.reset_index('Contract', drop=True).loc['2014-09'] 

But I hate this because it loses the index / not very nice (I need to make a lot of them).

I think I can do it like this:

 df.loc[:,'2014-09'] 

to return all data for September 2014. This actually does not work. I can choose only one day:

 df.loc[:,'2014-09-02'] 

Why is my multifunction slicer not working?

+5
source share
3 answers

Pandas you need to clearly indicate whether you are choosing columns or sub-levels of a hierarchical index. In this case, df.loc[:,'2014-09'] fails because pandas tries to retrieve all the rows and then searches for the column labeled '2014-09' (which does not exist).

Instead, you need to provide both multi-index levels and column labels / slice.

To select all the data for May 2014 from your example, you can write:

 >>> df.loc[(slice(None), '2014-05'), :] Contract Date 201501 2014-05-01 1430.6 2014-05-02 1443.9 2014-05-05 1451.6 2014-05-06 1461.4 2014-05-07 1456.0 2014-05-08 1441.1 2014-05-09 1437.8 2014-05-12 1445.2 2014-05-13 1458.2 2014-05-14 1487.6 2014-05-15 1477.6 2014-05-16 1467.9 2014-05-19 1484.9 2014-05-20 1470.5 2014-05-21 1476.9 2014-05-22 1490.0 2014-05-23 1473.3 2014-05-27 1462.5 2014-05-28 1456.3 2014-05-29 1460.5 201507 2014-05-30 1463.5 

Here [(slice(None), '2014-05'), :] translates to the fragment [:, '2014-05'] for rows and [:] for columns.

The pd.IndexSlice object was introduced to facilitate this slice semantics:

 >>> idx = pd.IndexSlice >>> df.loc[idx[:, '2014-05'], :] # same slice of DataFrame 
+2
source

You can use pd.Indexslice to select based on the ranges for each level your MultiIndex , for example like this ( see docs ):

 idx = pd.IndexSlice df.loc[idx[:, '2014-05'], :] 

To obtain:

 Contract Date 201501 2014-05-01 1430.6 2014-05-02 1443.9 2014-05-05 1451.6 2014-05-06 1461.4 2014-05-07 1456.0 2014-05-08 1441.1 2014-05-09 1437.8 2014-05-12 1445.2 2014-05-13 1458.2 2014-05-14 1487.6 2014-05-15 1477.6 2014-05-16 1467.9 2014-05-19 1484.9 2014-05-20 1470.5 2014-05-21 1476.9 2014-05-22 1490.0 2014-05-23 1473.3 2014-05-27 1462.5 2014-05-28 1456.3 2014-05-29 1460.5 201507 2014-05-30 1463.5 
+2
source

You can use .dt accessor to extract all the values ​​for the month of September as follows:

 df.loc[(pd.to_datetime(df['Date']).dt.month == 9)] 

Time limits:

 timeit df.loc[(pd.to_datetime(df['Date']).dt.month == 5)] 1000 loops, best of 3: 796 Β΅s per loop 
+1
source

All Articles