Working with set_index in a Pandas DataFrame

Using the imported CSV file, I indexed the DataFrame like this ...

rdata.set_index(['race_date', 'track_code', 'race_number', 'horse_name']) 

This is what the DataFrame section looks like ...

  race_date track_code race_number horse_name work_date work_track 2007-08-24 BM 8 Count Me Twice 2007-05-31 PLN Count Me Twice 2007-06-09 PLN Count Me Twice 2007-06-16 PLN Count Me Twice 2007-06-23 PLN Count Me Twice 2007-08-05 PLN Judge Choice 2007-06-07 BM Judge Choice 2007-06-14 BM Judge Choice 2007-07-08 BM Judge Choice 2007-08-18 BM 

Why is the "horse_name" column not grouped like date, track and race? Perhaps this is by design, so how can I truncate this larger DataFrame by race to have a new DataFrame named "horse_name" as its index?

+8
python pandas
source share
1 answer

It's not a mistake. That is how it should work.

The DataFrame should show every single element in it. Therefore, if the index has one level, this level will be fully expanded. If it has two levels, the first level will be grouped, and the second will be fully expanded, if it has tree levels, the first two will be grouped, and the third will be expanded, etc.

That is why the name of the horse is not grouped. How would you see all the elements in a DataFrame if you also group the horse name :)

Try to do:

  rdata.set_index(['race_date', 'track_code', 'race_number']) 

or

  rdata.set_index(['race_date', 'track_code']) 

You will see that the last level of the index always expands completely so that you can see all the elements in the DataFrame.

+10
source share

All Articles