How to unlock (or rotate?) In pandas

I have a dataframe that looks like this:

import pandas as pd datelisttemp = pd.date_range('1/1/2014', periods=3, freq='D') s = list(datelisttemp)*3 s.sort() df = pd.DataFrame({'BORDER':['GERMANY','FRANCE','ITALY','GERMANY','FRANCE','ITALY','GERMANY','FRANCE','ITALY' ], 'HOUR1':[2 ,2 ,2 ,4 ,4 ,4 ,6 ,6, 6],'HOUR2':[3 ,3 ,3, 5 ,5 ,5, 7, 7, 7], 'HOUR3':[8 ,8 ,8, 12 ,12 ,12, 99, 99, 99]}, index=s) 

This gives me:

 Out[458]: df BORDER HOUR1 HOUR2 HOUR3 2014-01-01 GERMANY 2 3 8 2014-01-01 FRANCE 2 3 8 2014-01-01 ITALY 2 3 8 2014-01-02 GERMANY 4 5 12 2014-01-02 FRANCE 4 5 12 2014-01-02 ITALY 4 5 12 2014-01-03 GERMANY 6 7 99 2014-01-03 FRANCE 6 7 99 2014-01-03 ITALY 6 7 99 

I want the resulting data file to look something like this:

  HOUR GERMANY FRANCE ITALY 2014-01-01 1 2 2 2 2014-01-01 2 3 3 3 2014-01-01 3 8 8 8 2014-01-02 1 4 4 4 2014-01-02 2 5 5 5 2014-01-02 3 12 12 12 2014-01-03 1 6 6 6 2014-01-03 2 7 7 7 2014-01-03 3 99 99 99 

I did the following, but I'm not quite there:

 df['date_col'] = df.index df2 = melt(df, id_vars=['date_col','BORDER']) #Can I keep the same index after melt or do I have to set an index like below? df2.set_index(['date_col', 'variable'], inplace=True, drop=True) df2 = df2.sort() 

Df

 Out[465]: df2 BORDER value date_col variable 2014-01-01 HOUR1 GERMANY 2 HOUR1 FRANCE 2 HOUR1 ITALY 2 HOUR2 GERMANY 3 HOUR2 FRANCE 3 HOUR2 ITALY 3 HOUR3 GERMANY 8 HOUR3 FRANCE 8 HOUR3 ITALY 8 2014-01-02 HOUR1 GERMANY 4 HOUR1 FRANCE 4 HOUR1 ITALY 4 HOUR2 GERMANY 5 HOUR2 FRANCE 5 HOUR2 ITALY 5 HOUR3 GERMANY 12 HOUR3 FRANCE 12 HOUR3 ITALY 12 2014-01-03 HOUR1 GERMANY 6 HOUR1 FRANCE 6 HOUR1 ITALY 6 HOUR2 GERMANY 7 HOUR2 FRANCE 7 HOUR2 ITALY 7 HOUR3 GERMANY 99 HOUR3 FRANCE 99 HOUR3 ITALY 99 

I thought I could put off df2 to get something similar to my last framework, but I get all kinds of errors. I also tried to rotate this data file, but cannot get what I want.

+8
python stack pandas pivot
source share
3 answers

We want values ​​(for example, 'GERMANY' ) to become column names and column names (for example, 'HOUR1' ) to become 'HOUR1' values.

The stack method turns column names into index values ​​and the unstack method turns index values ​​into column names.

So by shifting the values ​​to the index, we can use stack and unstack to do the swap.

 import pandas as pd datelisttemp = pd.date_range('1/1/2014', periods=3, freq='D') s = list(datelisttemp)*3 s.sort() df = pd.DataFrame({'BORDER':['GERMANY','FRANCE','ITALY','GERMANY','FRANCE','ITALY','GERMANY','FRANCE','ITALY' ], 'HOUR1':[2 ,2 ,2 ,4 ,4 ,4 ,6 ,6, 6],'HOUR2':[3 ,3 ,3, 5 ,5 ,5, 7, 7, 7], 'HOUR3':[8 ,8 ,8, 12 ,12 ,12, 99, 99, 99]}, index=s) df = df.set_index(['BORDER'], append=True) df.columns.name = 'HOUR' df = df.unstack('BORDER') df = df.stack('HOUR') df = df.reset_index('HOUR') df['HOUR'] = df['HOUR'].str.replace('HOUR', '').astype('int') print(df) 

gives

 BORDER HOUR FRANCE GERMANY ITALY 2014-01-01 1 2 2 2 2014-01-01 2 3 3 3 2014-01-01 3 8 8 8 2014-01-02 1 4 4 4 2014-01-02 2 5 5 5 2014-01-02 3 12 12 12 2014-01-03 1 6 6 6 2014-01-03 2 7 7 7 2014-01-03 3 99 99 99 
+15
source share

Using df2 :

 >>> df2.pivot_table(values='value', index=['DATE', 'variable'], columns="BORDER") BORDER FRANCE GERMANY ITALY DATE variable 2014-01-01 HOUR1 2 2 2 HOUR2 3 3 3 HOUR3 8 8 8 2014-01-02 HOUR1 4 4 4 HOUR2 5 5 5 HOUR3 12 12 12 2014-01-03 HOUR1 6 6 6 HOUR2 7 7 7 HOUR3 99 99 99 [9 rows x 3 columns] 

There is a little more cleanup if you want to convert the index variable “variable” to the “HOUR” column and cross out the text “HOUR” from the values, but I think this is the main format you want.

+3
source share

Try using pivot. You can do this in one line. For example.

 df.pivot(index='start_time', columns='venue_name', values='ocupation') 
0
source share

All Articles