In Pandas How to sort one level of multi-index based on column values ​​while maintaining grouping of another level

I am currently taking a course on studying data at the university, but I am a bit stuck with the problem of sorting with multiple indexes.

The actual data includes about a million movie reviews, and I'm trying to analyze this based on US zip codes, but to check how to do what I want, I used a much smaller dataset of 250 random generated ratings for 10 films and instead of zip codes, I I use age groups.

So, this is what I have now, this is a multi-indexed DataFrame in Pandas with two levels: “group” and “name”

rating group title Alien 4.000000 Argo 2.166667 Adults Ben-Hur 3.666667 Gandhi 3.200000 ... ... Alien 3.000000 Argo 3.750000 Coeds Ben-Hur 3.000000 Gandhi 2.833333 ... ... Alien 2.500000 Argo 2.750000 Kids Ben-Hur 3.000000 Gandhi 3.200000 ... ... 

What I'm going to do is sort the headings based on their rating within the group (and show only the most popular 5 or so in each group)

So something like this (but I'm going to show only two names in each group):

  rating group title Alien 4.000000 Adults Ben-Hur 3.666667 Argo 3.750000 Coeds Alien 3.000000 Gandhi 3.200000 Kids Ben-Hur 3.000000 

Does anyone know how to do this? I tried sort_order, sort_index etc. And changed levels, but they also mix groups. Therefore, it looks like this:

  rating group title Adults Alien 4.000000 Coeds Argo 3.750000 Adults Ben-Hur 3.666667 Kids Gandhi 3.666667 Coeds Alien 3.000000 Kids Ben-Hur 3.000000 

I kind of looked for something like this: Sorting multiple indexes in Pandas , but instead of sorting at a different level, I want to sort based on values. It’s as if this person wanted to sort based on their sales column.

Thanks!

+7
python sorting pandas multi-index
source share
1 answer

You are looking for sort :

 In [11]: s = pd.Series([3, 1, 2], [[1, 1, 2], [1, 3, 1]]) In [12]: s.sort() In [13]: s Out[13]: 1 3 1 2 1 2 1 1 3 dtype: int64 

Note; this works in place (i.e. modifies s) to return using a copy of order :

 In [14]: s.order() Out[14]: 1 3 1 2 1 2 1 1 3 dtype: int64 

Update: I realized what you were actually asking, and I think this should be an option in sortlevels, but for now I think you need to reset_index, groupby and apply:

 In [21]: s.reset_index(name='s').groupby('level_0').apply(lambda s: s.sort('s')).set_index(['level_0', 'level_1'])['s'] Out[21]: level_0 level_1 1 3 1 1 3 2 1 2 Name: 0, dtype: int64 

Note. After that, you can set the level names to [No, No].

+2
source share

All Articles