Problem using substring to select data frame rows

Question

Problem using substring to select data frame rows

Python 2.7

In [3]:import pandas as pd
df = pd.DataFrame(dict(A=['abc','abc','abc','xyz','xyz'],
                       B='abcdef','abcdefghi','notthisone','uvwxyz','orthisone']))
In [4]: df
Out[4]:
    A   B
0   abc abcdef
1   abc abcdefghi
2   abc notthisone
3   xyz uvwxyz
4   xyz orthisone

In [12]:  df[df.B.str.contains(df.A) == True] 
# just keep the B that contain A string

TypeError: 'Series' objects are mutable, thus they cannot be hashed

I am trying to do this:

    A   B
0   abc abcdef
1   abc abcdefghi
3   xyz uvwxyz

I tried the str.contains expression options but did not go. Any help is greatly appreciated.

+4

python pandas

Pat s Jun 05 '15 at 2:39

source share

4 answers

Marius · Answer 1 · 2015-06-05T03:01:26+0000

It doesn't seem to str.containssupport multiple patterns, so you just need to apply them line by line:

substr_matches = df.apply(lambda row: row['B'].find(row['A']) > -1, axis=1)

df.loc[substr_matches]
Out[11]: 
     A          B
0  abc     abcdef
1  abc  abcdefghi
3  xyz     uvwxyz

Alexander · Answer 2 · 2015-06-05T06:03:39+0000

Apply lambda function to strings and check if A is in B.

>>> df[df.apply(lambda x: x.A in x.B, axis=1)]
     A          B
0  abc     abcdef
1  abc  abcdefghi
3  xyz     uvwxyz

Edchum · Answer 3 · 2015-06-05T07:54:40+0000

You can call uniquein column "A" and then join in |to create a template for matching using contains:

In [15]:
df[df['B'].str.contains('|'.join(df['A'].unique()))]

Out[15]:
     A          B
0  abc     abcdef
1  abc  abcdefghi
3  xyz     uvwxyz

fixxxer · Answer 4 · 2015-06-05T09:46:20+0000

How about this?

In [8]: df[df.apply(lambda v: v['A'] in v['B'], axis=1)]
Out[8]: 
     A          B
0  abc     abcdef
1  abc  abcdefghi
3  xyz     uvwxyz

Problem using substring to select data frame rows

Python 2.7

More articles: