Filtering pandas dataframe rows with str

I have a python pandas dataframe df with lots of rows. Of these rows, I want to cut and use only rows containing the word "ball" in the column "body". For this I can do:

df[df['body'].str.contains('ball')]

The problem is that I want it to be case insensitive, which means that if the word Ball or bAll appeared, I would also want to. One way to make case-insensitive is to turn the string into lower case, and then search in this way. I am wondering how to do this. I tried

df[df['body'].str.lower().contains('ball')]

But that does not work. I'm not sure if I should use a lambda function on this or something like that.

+7
python string pandas
source share
1 answer

You can either use .str again to access string methods, or (better, IMHO) use case=False to guarantee case insensitivity:

 >>> df = pd.DataFrame({"body": ["ball", "red BALL", "round sphere"]}) >>> df[df["body"].str.contains("ball")] body 0 ball >>> df[df["body"].str.lower().str.contains("ball")] body 0 ball 1 red BALL >>> df[df["body"].str.contains("ball", case=False)] body 0 ball 1 red BALL >>> df[df["body"].str.contains("ball", case=True)] body 0 ball 

(Note that if you are going to complete tasks, it is better to use df.loc to avoid the scary SettingWithCopyWarning parameter, but if we just select here, it does not matter.)

(Note # 2: I think I really didn't need to specify 'round' there ..)

+15
source share

All Articles