How to get rid of tiered index after using pandas pivot table?

I had the following data frame (the real data frame is much larger than this):

sale_user_id sale_product_id count 1 1 1 1 8 1 1 52 1 1 312 5 1 315 1 

Then I modified it to move the values ​​to sales_product_id as column headers using the following code:

 reshaped_df=id_product_count.pivot(index='sale_user_id',columns='sale_product_id',values='count') 

and the resulting data frame:

 sale_product_id -1057 1 2 3 4 5 6 8 9 10 ... 98 980 981 982 983 984 985 986 987 99 sale_user_id 1 NaN 1.0 NaN NaN NaN NaN NaN 1.0 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 3 NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 4 NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 

as you can see, we have a multi-level index, I need to have sale_user_is in the first column without multi-level indexing:

I use the following approach:

 reshaped_df.reset_index() 

the result will be such that I still have the sale_product_id column, but I no longer need it:

 sale_product_id sale_user_id -1057 1 2 3 4 5 6 8 9 ... 98 980 981 982 983 984 985 986 987 99 0 1 NaN 1.0 NaN NaN NaN NaN NaN 1.0 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1 3 NaN 1.0 NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 2 4 NaN NaN 1.0 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 

I can multiply this data frame to get rid of sale_product_id, but I don't think it would be efficient. I am looking for an effective way to get rid of multi-level indexing when changing the shape of the original frame

+15
source share
2 answers

You only need to remove the index name , use rename_axis (new in pandas 0.18.0 ):

 print (reshaped_df) sale_product_id 1 8 52 312 315 sale_user_id 1 1 1 1 5 1 print (reshaped_df.index.name) sale_user_id print (reshaped_df.rename_axis(None)) sale_product_id 1 8 52 312 315 1 1 1 1 5 1 

Another solution working in 0.18.0 below 0.18.0 :

 reshaped_df.index.name = None print (reshaped_df) sale_product_id 1 8 52 312 315 1 1 1 1 5 1 

If necessary, also delete columns name :

 print (reshaped_df.columns.name) sale_product_id print (reshaped_df.rename_axis(None).rename_axis(None, axis=1)) 1 8 52 312 315 1 1 1 1 5 1 

Another solution:

 reshaped_df.columns.name = None reshaped_df.index.name = None print (reshaped_df) 1 8 52 312 315 1 1 1 1 5 1 

EDIT by comment:

You need reset_index with drop=True parameter:

 reshaped_df = reshaped_df.reset_index(drop=True) print (reshaped_df) sale_product_id 1 8 52 312 315 0 1 1 1 5 1 #if need reset index nad remove column name reshaped_df = reshaped_df.reset_index(drop=True).rename_axis(None, axis=1) print (reshaped_df) 1 8 52 312 315 0 1 1 1 5 1 

If necessary, remove only the column name:

 reshaped_df = reshaped_df.rename_axis(None, axis=1) print (reshaped_df) 1 8 52 312 315 sale_user_id 1 1 1 1 5 1 

Edit1:

So if you need to create a new column from index and delete columns names :

 reshaped_df = reshaped_df.rename_axis(None, axis=1).reset_index() print (reshaped_df) sale_user_id 1 8 52 312 315 0 1 1 1 1 5 1 
+13
source

The way it works for me

 df_cross=pd.DataFrame(pd.crosstab(df[c1], df[c2]).to_dict()).reset_index() 
0
source

All Articles