Pandas DataFrame stacks multiple column values ​​into a single column

Assuming the following DataFrame:

key.0 key.1 key.2 topic 1 abc def ghi 8 2 xab xcd xef 9 

How to combine the values ​​of all key. * columns into a single column "key" associated with the topic value corresponding to the key. * columns? This is the result I want:

  topic key 1 8 abc 2 8 def 3 8 ghi 4 9 xab 5 9 xcd 6 9 xef 

Note that the number of columns of key.N is a variable on some external N.

+7
python pandas dataframe melt
source share
3 answers

You can melt your data frame:

 >>> keys = [c for c in df if c.startswith('key.')] >>> pd.melt(df, id_vars='topic', value_vars=keys, value_name='key') topic variable key 0 8 key.0 abc 1 9 key.0 xab 2 8 key.1 def 3 9 key.1 xcd 4 8 key.2 ghi 5 9 key.2 xef 

It also gives you a key source.


From v0.20 , melt is a function of the first class of the pd.DataFrame class:

 >>> df.melt('topic', value_name='key').drop('variable', 1) topic key 0 8 abc 1 9 xab 2 8 def 3 9 xcd 4 8 ghi 5 9 xef 
+12
source share

After various ways, I find the following more or less intuitive, if you understand stack :

 # keep topic as index, stack other columns 'against' it stacked = df.set_index('topic').stack() # set the name of the new series created df = stacked.reset_index(name='key') # drop the 'source' level (key.*) df.drop('level_1', axis=1, inplace=True) 

The resulting data frame will be as required:

  topic key 0 8 abc 1 8 def 2 8 ghi 3 9 xab 4 9 xcd 5 9 xef 

You can print the intermediate results to fully understand the process. If you don't mind to have more columns than necessary, the key steps are set_index('topic') , stack() and reset_index(name='key') .

+2
source share

OK, if one of the current answers is marked as duplicated from this question, I will answer here.

Using wide_to_long

 pd.wide_to_long(df, ['key'], 'topic', 'age').reset_index().drop('age',1) Out[123]: topic key 0 8 abc 1 9 xab 2 8 def 3 9 xcd 4 8 ghi 5 9 xef 
+1
source share

All Articles