Python & Pandas: how to execute a query if a list type column contains something?

I have a dataframe that contains information about movies. It has a column called genre that contains a list of genres to which it belongs. for example

 df['genre'] ## returns 0 ['comedy', 'sci-fi'] 1 ['action', 'romance', 'comedy'] 2 ['documentary'] 3 ['crime','horror'] ... 

I want to know how I can request df, so it returns a cerain movie?

For example, something might like df['genre'].contains('comedy') returns 0, 1.

I know for a list, I can do something like

 'comedy' in ['comedy', 'sci-fi'] 

but in pandas I did not find something similar, the only thing I know is df['genre'].str.contains() , but it does not work for a list type.

+7
python pandas
source share
3 answers

You can use apply to create a mask and then boolean indexing :

 mask = df.genre.apply(lambda x: 'comedy' in x) df1 = df[mask] print (df1) genre 0 [comedy, sci-fi] 1 [action, romance, comedy] 
+7
source share

using kits

 df.genre.map(set(['comedy']).issubset) 0 True 1 True 2 False 3 False dtype: bool 

 df.genre[df.genre.map(set(['comedy']).issubset)] 0 [comedy, sci-fi] 1 [action, romance, comedy] dtype: object 

presented in a way that I like better

 comedy = set(['comedy']) iscomedy = comedy.issubset df[df.genre.map(iscomedy)] 

more efficient

 comedy = set(['comedy']) iscomedy = comedy.issubset df[[iscomedy(l) for l in df.genre.values.tolist()]] 

using str in two passes
slow! and not exactly!

 df[df.genre.str.join(' ').str.contains('comedy')] 
+5
source share

According to the source code , you can use .str.contains(..., regex=False) .

0
source share

All Articles