Pandas efficient check if a column contains a row in another column

Question

Pandas efficient check if a column contains a row in another column

I am trying to get the logical index of whether one column contains a row from the same row in another column:

ab boop beep bop zorp zorpfoo zip foo zip fa

To check if column b contains a row, I would like to get:

 [False, True, True]

I'm trying to use this approach now, but it's slow:

 df.apply(lambda row: row['a'] in row['b'], axis=1)

Is there a .str method for this?

+6

python pandas

Luke Oct 20 '15 at 19:29

source share

1 answer

xmduhan · Answer 1 · 2017-04-06T08:15:27+0000

df.apply (..., axis = 1) is very slow! You must avoid using it!

 from random import sample from string import lowercase from pandas import DataFrame df = DataFrame({ 'a': map(lambda x: ''.join(sample(lowercase, 2)), range(100000)), 'b': map(lambda x: ''.join(sample(lowercase, 5)), range(100000)) }) %time map(lambda (x, y): x in y, zip(df['a'], df['b'])) %time df.apply(lambda x: x[0] in x[1], axis=1)

Pandas efficient check if a column contains a row in another column

df.apply (..., axis = 1) is very slow! You must avoid using it!

More articles: