Merge values โ€‹โ€‹of multiple columns into one column in python pandas

I have a pandas data frame like this:

Column1 Column2 Column3 Column4 Column5 0 a 1 2 3 4 1 a 3 4 5 2 b 6 7 8 3 c 7 7 

Now I want to get a new dataframe containing Column1 and a new column A. This column A should contain all the values โ€‹โ€‹from columns 2 - (to) n (where n is the number of columns from column 2 to the end of the row), for example:

  Column1 ColumnA 0 a 1,2,3,4 1 a 3,4,5 2 b 6,7,8 3 c 7,7 

How could I best approach this issue? Any advice would be helpful. Thanks in advance!

+24
python list pandas dataframe row
source share
4 answers

You can call apply to pass axis=1 to apply line by line, then convert dtype to str and join :

 In [153]: df['ColumnA'] = df[df.columns[1:]].apply( lambda x: ','.join(x.dropna().astype(str)), axis=1 ) df Out[153]: Column1 Column2 Column3 Column4 Column5 ColumnA 0 a 1 2 3 4 1,2,3,4 1 a 3 4 5 NaN 3,4,5 2 b 6 7 8 NaN 6,7,8 3 c 7 7 NaN NaN 7,7 

Here I call dropna to get rid of NaN , however we need to cast to int again so that we don't end with a floating point like str.

+50
source share

I suggest using .assign

 df2 = df.assign(ColumnA = df.Column2.astype(str) + ', ' + \ df.Column3.astype(str) + ', ' df.Column4.astype(str) + ', ' \ df.Column4.astype(str) + ', ' df.Column5.astype(str)) 

it's just maybe a long time but it worked for me

+4
source share

If you have many columns, say 1000 columns in a data frame, and you want to combine several columns based on a particular column name for example, Column2 and an arbitrary number. columns after this column (for example, there are 3 columns after 'Column2 including Column2 as Column2 in Column2 OP).

We can get the position of the column using .get_loc() - as answered here

 source_col_loc = df.columns.get_loc('Column2') # column position starts from 0 df['ColumnA'] = df.iloc[:,source_col_loc+1:source_col_loc+4].apply( lambda x: ",".join(x.astype(str)), axis=1) df Column1 Column2 Column3 Column4 Column5 ColumnA 0 a 1 2 3 4 1,2,3,4 1 a 3 4 5 NaN 3,4,5 2 b 6 7 8 NaN 6,7,8 3 c 7 7 NaN NaN 7,7 

To remove NaN , use .dropna() or .fillna()

Hope it helps!

+1
source share

Based on Amin's answer, you can use df.assign with a list of columns of any types that might not be rows:

 target_cols = ['Column1', 'Column2'] sep = ' ' df = df.assign(JoinKey = lambda srs: sep.join(str(x) for x in srs[target_cols])) 
0
source share

All Articles