Flatten DataFrame with Multi-Index Columns

I would like to convert the Pandas DataFrame that was retrieved from the pivot table into a row view, as shown below.

This is where I am located:

import pandas as pd import numpy as np df = pd.DataFrame({ 'goods': ['a', 'a', 'b', 'b', 'b'], 'stock': [5, 10, 30, 40, 10], 'category': ['c1', 'c2', 'c1', 'c2', 'c1'], 'date': pd.to_datetime(['2014-01-01', '2014-02-01', '2014-01-06', '2014-02-09', '2014-03-09']) }) # we don't care about year in this example df['month'] = df['date'].map(lambda x: x.month) piv = df.pivot_table(["stock"], "month", ["goods", "category"], aggfunc="sum") piv = piv.reindex(np.arange(piv.index[0], piv.index[-1] + 1)) piv = piv.ffill(axis=0) piv = piv.fillna(0) print piv 

that leads to

 stock goods ab category c1 c2 c1 c2 month 1 5 0 30 0 2 5 10 30 40 3 5 10 10 40 

And here I want to get there.

 goods category month stock a c1 1 5 a c1 2 0 a c1 3 0 a c2 1 0 a c2 2 10 a c2 3 0 b c1 1 30 b c1 2 0 b c1 3 10 b c2 1 0 b c2 2 40 b c2 3 0 

Earlier , I used

 piv = piv.stack() piv = piv.reset_index() print piv 

to get rid of the multi-index, but this leads to this because I now rotate on two columns ( ["goods", "category"] ):

  month category stock goods ab 0 1 c1 5 30 1 1 c2 0 0 2 2 c1 5 30 3 2 c2 10 40 4 3 c1 5 10 5 3 c2 10 40 

Does anyone know how I can get rid of a multi-index in a column and get the result in an example format DataFrame?

+11
python pandas pivot-table
source share
3 answers
 >>> piv.unstack().reset_index().drop('level_0', axis=1) goods category month 0 0 a c1 1 5 1 a c1 2 5 2 a c1 3 5 3 a c2 1 0 4 a c2 2 10 5 a c2 3 10 6 b c1 1 30 7 b c1 2 30 8 b c1 3 10 9 b c2 1 0 10 b c2 2 40 11 b c2 3 40 

then you only need to change the name of the last column from 0 to stock .

+6
source share

It seems to me that melt (aka univot) is very close to what you want to do:

 In [11]: pd.melt(piv) Out[11]: NaN goods category value 0 stock a c1 5 1 stock a c1 5 2 stock a c1 5 3 stock a c2 0 4 stock a c2 10 5 stock a c2 10 6 stock b c1 30 7 stock b c1 30 8 stock b c1 10 9 stock b c2 0 10 stock b c2 40 11 stock b c2 40 

There is a rogue column (stock) that appears here that the column heading is constant in piv. If we pump out first, the melt runs OOTB:

 In [12]: piv.columns = piv.columns.droplevel(0) In [13]: pd.melt(piv) Out[13]: goods category value 0 a c1 5 1 a c1 5 2 a c1 5 3 a c2 0 4 a c2 10 5 a c2 10 6 b c1 30 7 b c1 30 8 b c1 10 9 b c2 0 10 b c2 40 11 b c2 40 

Edit: the above actually reduces the index, you need to make a column with reset_index :

 In [21]: pd.melt(piv.reset_index(), id_vars=['month'], value_name='stock') Out[21]: month goods category stock 0 1 a c1 5 1 2 a c1 5 2 3 a c1 5 3 1 a c2 0 4 2 a c2 10 5 3 a c2 10 6 1 b c1 30 7 2 b c1 30 8 3 b c1 10 9 1 b c2 0 10 2 b c2 40 11 3 b c2 40 
+4
source share

I know that this question has already been answered, but for my problem of multi-index dataset columns, the provided solution was ineffective. Therefore, here I am laying out another solution for expanding multi-index columns using pandas.

Here is the problem I had:

enter image description here

As you can see, the data frame consists of 3 multi-index and two-level multi-index columns.

Desired data format:

enter image description here

When I tried the parameters indicated above, the pd.melt function did not allow to have more than one column in the var_name attribute. Therefore, every time I tried to melt, I lost some attribute from my table.

The solution I found was to apply the double stack function to my data frame.

Before coding, it is worth noting that the desired var_name for my column in the unpublished table was "Populacao residente em domicilios speculares ocupados" (see code below). Therefore, for all my value records, they must be added to the newly created new var_name column.

Here is the code snippet:

 import pandas as pd # reading my table df = pd.read_excel(r'my_table.xls', sep=',', header=[2,3], encoding='latin3', index_col=[0,1,2], na_values=['-', ' ', '*'], squeeze=True).fillna(0) df.index.names = ['COD_MUNIC_7', 'NOME_MUN', 'TIPO'] df.columns.names = ['sexo', 'faixa_etaria'] df.head() # making the stacking: df = pd.DataFrame(pd.Series(df.stack(level=0).stack(), name='Populacao residente em domicilios particulares ocupados')).reset_index() df.head() 

Another solution I found was to first apply the stacking function to the data frame, and then apply the melt.

Here is an alternative code:

 df = df.stack('faixa_etaria').reset_index().melt(id_vars=['COD_MUNIC_7', 'NOME_MUN','TIPO', 'faixa_etaria'], value_vars=['Homens', 'Mulheres'], value_name='Populacao residente em domicilios particulares ocupados', var_name='sexo') df.head() 

Yours sincerely,

Philip Riscalla Lil

0
source share

All Articles