Finding a row in all Pandas DataFrame columns and filters

Question

Finding a row in all Pandas DataFrame columns and filters

Thought it would be straightforward, but he was having trouble finding an elegant way to find all the columns in the data framework at the same time for partial row matching. Basically, how would I apply df['col1'].str.contains('^') to the whole data frame right away and filter out to any lines containing records containing a match?

+25

python pandas

horatio1701d Oct 29 '14 at 20:33

source share

3 answers

Try:

 df.apply(lambda row: row.astype(str).str.contains('TEST').any(), axis=1)

+11

Puneet sinha Oct 30 '17 at 7:36

source share

publish my findings in case anyone needs it.

I had a Dataframe (360,000 rows) needed to search the entire data frame to find rows (just a few) that contained the word "TOTAL" (any option, for example, "TOTAL PRICE", "TOTAL STEMS", etc. .), and delete their rows.

I finally processed the data frame in two steps:

FIND COLUMNS THAT CONTAINS A WORD:

 for i in df.columns: df[i].astype('str').apply(lambda x: print(df[i].name) if x.startswith('TOTAL') else 'pass')

DELETE LINES:

 df[df['LENGTH/ CMS'].str.contains('TOTAL') != True]

0

Ciro Jun 11 '19 at 12:58

source share

unutbu · Accepted Answer · 2014-10-29T21:35:32+0000

The Series.str.contains method expects a regular expression pattern (default), not a literal string. Therefore, str.contains("^") matches the beginning of any line. Since each line has a beginning, everything matches. Instead, use str.contains("\^") to match the alphabetic character ^ .

To check each column, you can use for col in df to iterate over the column names, and then call str.contains for each column:

 mask = np.column_stack([df[col].str.contains(r"\^", na=False) for col in df]) df.loc[mask.any(axis=1)]

Alternatively, you can pass regex=False to str.contains so that the test uses the Python in statement; but (in general) using regex is faster.

Finding a row in all Pandas DataFrame columns and filters

More articles: