Finding a row in all Pandas DataFrame columns and filters

Thought it would be straightforward, but he was having trouble finding an elegant way to find all the columns in the data framework at the same time for partial row matching. Basically, how would I apply df['col1'].str.contains('^') to the whole data frame right away and filter out to any lines containing records containing a match?

+25
python pandas
source share
3 answers

The Series.str.contains method expects a regular expression pattern (default), not a literal string. Therefore, str.contains("^") matches the beginning of any line. Since each line has a beginning, everything matches. Instead, use str.contains("\^") to match the alphabetic character ^ .

To check each column, you can use for col in df to iterate over the column names, and then call str.contains for each column:

 mask = np.column_stack([df[col].str.contains(r"\^", na=False) for col in df]) df.loc[mask.any(axis=1)] 

Alternatively, you can pass regex=False to str.contains so that the test uses the Python in statement; but (in general) using regex is faster.

+28
source share

Try:

 df.apply(lambda row: row.astype(str).str.contains('TEST').any(), axis=1) 
+11
source share

publish my findings in case anyone needs it.

I had a Dataframe (360,000 rows) needed to search the entire data frame to find rows (just a few) that contained the word "TOTAL" (any option, for example, "TOTAL PRICE", "TOTAL STEMS", etc. .), and delete their rows.

I finally processed the data frame in two steps:

FIND COLUMNS THAT CONTAINS A WORD:

 for i in df.columns: df[i].astype('str').apply(lambda x: print(df[i].name) if x.startswith('TOTAL') else 'pass') 

DELETE LINES:

 df[df['LENGTH/ CMS'].str.contains('TOTAL') != True] 
0
source share

All Articles