Merge two data frames across multiple values

I have two data frames that look like

df1

name ID abb 0 foo 251803 I 1 bar 376811 R 2 baz 174254 Q 3 foofoo 337144 IRQ 4 barbar 306521 IQ 

df2

  abb comment 0 I fine 1 R repeat 2 Q other 

I am trying to use pandas merge to combine two data frames and simply assign the comment column in the second data frame as the first based on the abb column as follows:

 df1.merge(df2, how='inner', on='abb') 

as a result of:

  name ID abb comment 0 foo 251803 I fine 1 bar 376811 R repeat 2 baz 174254 Q other 

This works well for unique single-letter identifiers in abb . However, he obviously fails for more than one character.

I tried using list in the abb column in the first data frame, but this leads to a KeyError .

I would like to do the following.

1) Divide lines containing more than one character in this column into several lines

2) Merging data frames

3) Optional: concatenate lines again

+1
source share
2 answers

Use join :

 print (df1) name ID abb 0 foo 251803 I 1 bar 376811 R 2 baz 174254 Q 3 foofoo 337144 IRQ 4 barbar 306521 IQ #each character to df, which is stacked to Series s = df1.abb.apply(lambda x: pd.Series(list(x))) .stack() .reset_index(drop=True, level=1) .rename('abb') print (s) 0 I 1 R 2 Q 3 I 3 R 3 Q 4 I 4 Q Name: abb, dtype: object df1 = df1.drop('abb', axis=1).join(s) print (df1) name ID abb 0 foo 251803 I 1 bar 376811 R 2 baz 174254 Q 3 foofoo 337144 I 3 foofoo 337144 R 3 foofoo 337144 Q 4 barbar 306521 I 4 barbar 306521 Q 
+2
source

See this answer for different ways of exploding in a column.

 rows = [] for i, row in df1.iterrows(): for a in row.abb: rows.append([row['ID'], a, row['name']]) df11 = pd.DataFrame(rows, columns=df1.columns) df11.merge(df2) 

enter image description here

+1
source

All Articles