MultiIndex Slicing requires the index to be fully lexsorted

Question

MultiIndex Slicing requires the index to be fully lexsorted

I have a data frame with index ( year , foo ), where I would like to select the largest observations of X foo , where year == someYear .

My approach was

 df.sort_index(level=[0, 1], ascending=[1, 0], inplace=True) df.loc[pd.IndexSlice[2002, :10], :]

but i get

 KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (0)'

I tried different sorting options (for example, ascending = [0, 0] ), but all of them led to some error.

If I only need the string xth , I could df.groupby(level=[0]).nth(x) after sorting, but since I want a set of strings, this is not very efficient.

What is the best way to select these lines? Some data to play:

  rank_int rank year foo 2015 1.381845 2 320 1.234795 2 259 1.148488 199 2 0.866704 2 363 0.738022 2 319

+8

python pandas

Foobar Oct 05 '16 at 14:16

source share

4 answers

Danila savenkov · Answer 1 · 2017-07-31T08:18:27+0000

First, you should sort as follows:

 df.sort_index(level=['year','foo'], ascending=[1, 0], inplace=True)

It should fix KeyError. But df.loc[pd.IndexSlice[2002, :10], :] will not give you the result you expect. The loc function is not iloc, and it will try to find indices 0,1..9 in foo. Secondary Multiindex levels do not support iloc, I would suggest using groupby. If you already have this multi-index, you should do:

 df.reset_index() df = df.sort_values(by=['year','foo'],ascending=[True,False]) df.groupby('year').head(10)

If you need n entries with the smallest foo value, you can use tail(n) . If you need, say, the first, third and fifth entries, you can use nth([0,2,4]) , as you mentioned in the question. I think this is the most efficient way to do this.

ASGM · Answer 2 · 2016-10-05T14:41:57+0000

ascending should be a boolean, not a list . Try sorting as follows:

df.sort_index(ascending=True, inplace=True)

Foobar · Answer 3 · 2016-10-05T14:44:25+0000

To get xth second level observations as needed, you can combine loc with iloc :

 df.sort_index(level=[0, 1], ascending=[1, 0], inplace=True) df.loc[2015].iloc[:10]

works as expected. This does not respond to the strange wrt lexsorting index lock, however.

tsando · Answer 4 · 2017-07-28T16:24:33+0000

For me, this worked with sort_index(axis=1) :

 df = df.sort_index(axis=1)

Once you do this, you can use slice or pandas.IndexSlice , for example:

 df.loc[:, idx[:, 'A']]

MultiIndex Slicing requires the index to be fully lexsorted

More articles: