Merge two data frames across multiple values

Question

Merge two data frames across multiple values

I have two data frames that look like

df1

name ID abb 0 foo 251803 I 1 bar 376811 R 2 baz 174254 Q 3 foofoo 337144 IRQ 4 barbar 306521 IQ

df2

  abb comment 0 I fine 1 R repeat 2 Q other

I am trying to use pandas merge to combine two data frames and simply assign the comment column in the second data frame as the first based on the abb column as follows:

 df1.merge(df2, how='inner', on='abb')

as a result of:

  name ID abb comment 0 foo 251803 I fine 1 bar 376811 R repeat 2 baz 174254 Q other

This works well for unique single-letter identifiers in abb . However, he obviously fails for more than one character.

I tried using list in the abb column in the first data frame, but this leads to a KeyError .

I would like to do the following.

1) Divide lines containing more than one character in this column into several lines

2) Merging data frames

3) Optional: concatenate lines again

+1

python pandas

Fourier Jul 21 '16 at 8:05

source share

2 answers

See this answer for different ways of exploding in a column.

 rows = [] for i, row in df1.iterrows(): for a in row.abb: rows.append([row['ID'], a, row['name']]) df11 = pd.DataFrame(rows, columns=df1.columns) df11.merge(df2)

+1

piRSquared Jul 21 '16 at 8:32

source share

jezrael · Accepted Answer · 2016-07-21T08:11:50+0000

Use join :

 print (df1) name ID abb 0 foo 251803 I 1 bar 376811 R 2 baz 174254 Q 3 foofoo 337144 IRQ 4 barbar 306521 IQ #each character to df, which is stacked to Series s = df1.abb.apply(lambda x: pd.Series(list(x))) .stack() .reset_index(drop=True, level=1) .rename('abb') print (s) 0 I 1 R 2 Q 3 I 3 R 3 Q 4 I 4 Q Name: abb, dtype: object df1 = df1.drop('abb', axis=1).join(s) print (df1) name ID abb 0 foo 251803 I 1 bar 376811 R 2 baz 174254 Q 3 foofoo 337144 I 3 foofoo 337144 R 3 foofoo 337144 Q 4 barbar 306521 I 4 barbar 306521 Q

Merge two data frames across multiple values

More articles: