Unmelt Pandas DataFrame

I have a pandas dataframe with two id variables:

df = pd.DataFrame({'id': [1,1,1,2,2,3], 'num': [10,10,12,13,14,15], 'q': ['a', 'b', 'd', 'a', 'b', 'z'], 'v': [2,4,6,8,10,12]}) id num qv 0 1 10 a 2 1 1 10 b 4 2 1 12 d 6 3 2 13 a 8 4 2 14 b 10 5 3 15 z 12 

I can expand the table with:

 df.pivot('id','q','v') 

And it turned out something close:

 qabdz id 1 2 4 6 NaN 2 8 10 NaN NaN 3 NaN NaN NaN 12 

However, I really want (original unmelted form):

 id num abdz 1 10 2 4 NaN NaN 1 12 NaN NaN 6 NaN 2 13 8 NaN NaN NaN 2 14 NaN 10 NaN NaN 3 15 NaN NaN NaN 12 

In other words:

  • 'id' and 'num' are my indexes (usually I only saw "id" or "num" being an index, but I need both since I'm trying to restore the original unmelted form)
  • 'q' are my columns
  • 'v' are my values ​​in the table

Update

I found a solution to close from Wes McKinney 's blog :

 df.pivot_table(index=['id','num'], columns='q') vqabdz id num 1 10 2 4 NaN NaN 12 NaN NaN 6 NaN 2 13 8 NaN NaN NaN 14 NaN 10 NaN NaN 3 15 NaN NaN NaN 12 

However, the format is not quite the same as above.

+18
python pandas
source share
6 answers

You are very close. Just rename the column index to None, and you have what you want.

 df2 = df.pivot_table(index=['id','num'], columns='q') df2.columns = df2.columns.droplevel().rename(None) df2.reset_index().fillna("null").to_csv("test.csv", sep="\t", index=None) 

Note that the v column must be the default by default so that it can be aggregated. Otherwise, Pandas will fail:

 DataError: No numeric types to aggregate 

To solve this problem, you can specify your own aggregation function using a custom lambda function:

 df2 = df.pivot_table(index=['id','num'], columns='q', aggfunc= lambda x: x) 
+16
source share

You can use set_index and unstack

 In [18]: df.set_index(['id', 'num', 'q'])['v'].unstack().reset_index() Out[18]: q id num abdz 0 1 10 2.0 4.0 NaN NaN 1 1 12 NaN NaN 6.0 NaN 2 2 13 8.0 NaN NaN NaN 3 2 14 NaN 10.0 NaN NaN 4 3 15 NaN NaN NaN 12.0 
+13
source share

you can remove the name q.

 df1.columns=df1.columns.tolist() 

Zero answer + delete q =

 df1 = df.set_index(['id', 'num', 'q'])['v'].unstack().reset_index() df1.columns=df1.columns.tolist() id num abdz 0 1 10 2.0 4.0 NaN NaN 1 1 12 NaN NaN 6.0 NaN 2 2 13 8.0 NaN NaN NaN 3 2 14 NaN 10.0 NaN NaN 4 3 15 NaN NaN NaN 12.0 
+2
source share

Came to a closed decision

 df2 = df.pivot_table(index=['id','num'], columns='q') df2.columns = df2.columns.droplevel() df2.reset_index().fillna("null").to_csv("test.csv", sep="\t", index=None) 

Still unable to determine how to remove "q" from data frame

+1
source share

This might work fine:

  • Pivot

df2 = (df.pivot_table(index=['id', 'num'], columns='q', values='v')).reset_index())

  1. Match the column names of level 1 with the second

df2.columns =[s1 + str(s2) for (s1,s2) in df2.columns.tolist()]

+1
source share

This can be done in three stages:

 #1: Prepare auxilary column 'id_num': df['id_num'] = df[['id', 'num']].apply(tuple, axis=1) df = df.drop(columns=['id', 'num']) #2: 'pivot' is almost an inverse of melt: df, df.columns.name = df.pivot(index='id_num', columns='q', values='v').reset_index(), '' #3: Bring back 'id' and 'num' columns: df['id'], df['num'] = zip(*df['id_num']) df = df.drop(columns=['id_num']) 

This is the result, but with a different column order:

  abdz id num 0 2.0 4.0 NaN NaN 1 10 1 NaN NaN 6.0 NaN 1 12 2 8.0 NaN NaN NaN 2 13 3 NaN 10.0 NaN NaN 2 14 4 NaN NaN NaN 12.0 3 15 

Or in the correct order:

 def multiindex_pivot(df, columns=None, values=None): #inspired by: https://github.com/pandas-dev/pandas/issues/23955 names = list(df.index.names) df = df.reset_index() list_index = df[names].values tuples_index = [tuple(i) for i in list_index] # hashable df = df.assign(tuples_index=tuples_index) df = df.pivot(index="tuples_index", columns=columns, values=values) tuples_index = df.index # reduced index = pd.MultiIndex.from_tuples(tuples_index, names=names) df.index = index df = df.reset_index() #me df.columns.name = '' #me return df df = df.set_index(['id', 'num']) df = multiindex_pivot(df, columns='q', values='v') 
0
source share

All Articles