Unmelt Pandas DataFrame

Question

Unmelt Pandas DataFrame

I have a pandas dataframe with two id variables:

df = pd.DataFrame({'id': [1,1,1,2,2,3], 'num': [10,10,12,13,14,15], 'q': ['a', 'b', 'd', 'a', 'b', 'z'], 'v': [2,4,6,8,10,12]}) id num qv 0 1 10 a 2 1 1 10 b 4 2 1 12 d 6 3 2 13 a 8 4 2 14 b 10 5 3 15 z 12

I can expand the table with:

 df.pivot('id','q','v')

And it turned out something close:

 qabdz id 1 2 4 6 NaN 2 8 10 NaN NaN 3 NaN NaN NaN 12

However, I really want (original unmelted form):

 id num abdz 1 10 2 4 NaN NaN 1 12 NaN NaN 6 NaN 2 13 8 NaN NaN NaN 2 14 NaN 10 NaN NaN 3 15 NaN NaN NaN 12

In other words:

'id' and 'num' are my indexes (usually I only saw "id" or "num" being an index, but I need both since I'm trying to restore the original unmelted form)
'q' are my columns
'v' are my values in the table

Update

I found a solution to close from Wes McKinney 's blog :

 df.pivot_table(index=['id','num'], columns='q') vqabdz id num 1 10 2 4 NaN NaN 12 NaN NaN 6 NaN 2 13 8 NaN NaN NaN 14 NaN 10 NaN NaN 3 15 NaN NaN NaN 12

However, the format is not quite the same as above.

+18

python pandas

slaw Jul 9 '15 at 1:44

source share

6 answers

You can use set_index and unstack

 In [18]: df.set_index(['id', 'num', 'q'])['v'].unstack().reset_index() Out[18]: q id num abdz 0 1 10 2.0 4.0 NaN NaN 1 1 12 NaN NaN 6.0 NaN 2 2 13 8.0 NaN NaN NaN 3 2 14 NaN 10.0 NaN NaN 4 3 15 NaN NaN NaN 12.0

+13

Zero Oct 14 '17 at 13:49

source share

you can remove the name q.

 df1.columns=df1.columns.tolist()

Zero answer + delete q =

 df1 = df.set_index(['id', 'num', 'q'])['v'].unstack().reset_index() df1.columns=df1.columns.tolist() id num abdz 0 1 10 2.0 4.0 NaN NaN 1 1 12 NaN NaN 6.0 NaN 2 2 13 8.0 NaN NaN NaN 3 2 14 NaN 10.0 NaN NaN 4 3 15 NaN NaN NaN 12.0

+2

johnInHome Nov 13 '17 at 14:25

source share

Came to a closed decision

 df2 = df.pivot_table(index=['id','num'], columns='q') df2.columns = df2.columns.droplevel() df2.reset_index().fillna("null").to_csv("test.csv", sep="\t", index=None)

Still unable to determine how to remove "q" from data frame

+1

slaw Jul 9 '15 at 1:55

source share

This might work fine:

Pivot

df2 = (df.pivot_table(index=['id', 'num'], columns='q', values='v')).reset_index())

Match the column names of level 1 with the second

df2.columns =[s1 + str(s2) for (s1,s2) in df2.columns.tolist()]

+1

Hillary murefu Mar 22 '18 at 12:08

source share

This can be done in three stages:

 #1: Prepare auxilary column 'id_num': df['id_num'] = df[['id', 'num']].apply(tuple, axis=1) df = df.drop(columns=['id', 'num']) #2: 'pivot' is almost an inverse of melt: df, df.columns.name = df.pivot(index='id_num', columns='q', values='v').reset_index(), '' #3: Bring back 'id' and 'num' columns: df['id'], df['num'] = zip(*df['id_num']) df = df.drop(columns=['id_num'])

This is the result, but with a different column order:

  abdz id num 0 2.0 4.0 NaN NaN 1 10 1 NaN NaN 6.0 NaN 1 12 2 8.0 NaN NaN NaN 2 13 3 NaN 10.0 NaN NaN 2 14 4 NaN NaN NaN 12.0 3 15

Or in the correct order:

 def multiindex_pivot(df, columns=None, values=None): #inspired by: https://github.com/pandas-dev/pandas/issues/23955 names = list(df.index.names) df = df.reset_index() list_index = df[names].values tuples_index = [tuple(i) for i in list_index] # hashable df = df.assign(tuples_index=tuples_index) df = df.pivot(index="tuples_index", columns=columns, values=values) tuples_index = df.index # reduced index = pd.MultiIndex.from_tuples(tuples_index, names=names) df.index = index df = df.reset_index() #me df.columns.name = '' #me return df df = df.set_index(['id', 'num']) df = multiindex_pivot(df, columns='q', values='v')

0

Quant christo Oct 12 '19 at 19:42

source share

khammel · Accepted Answer · 2015-07-09T12:32:44+0000

You are very close. Just rename the column index to None, and you have what you want.

 df2 = df.pivot_table(index=['id','num'], columns='q') df2.columns = df2.columns.droplevel().rename(None) df2.reset_index().fillna("null").to_csv("test.csv", sep="\t", index=None)

Note that the v column must be the default by default so that it can be aggregated. Otherwise, Pandas will fail:

 DataError: No numeric types to aggregate

To solve this problem, you can specify your own aggregation function using a custom lambda function:

 df2 = df.pivot_table(index=['id','num'], columns='q', aggfunc= lambda x: x)

Unmelt Pandas DataFrame

More articles: