First, you can convert everything to lowercase, remove punctuation and spaces, and then convert the result to a set of words.
import string df['words'] = [set(words) for words in df['col_name'] .str.lower() .str.replace('[{0}]*'.format(string.punctuation), '') .str.strip() .str.split() ] >>> df col_name words 0 This is Donald. {this, is, donald} 1 His hands are so small {small, his, so, are, hands} 2 Why are his fingers so short? {short, fingers, his, so, are, why}
Now you can use logical indexing to see if all of your target words are in these new word sets.
target_words = ['is', 'small']
To extract the corresponding rows:
>>> df.loc[df.match, 'col_name']
To do this all in one expression using logical indexing:
df.loc[[all(target_word in word_set for target_word in target_words) for word_set in (set(words) for words in df['col_name'] .str.lower() .str.replace('[{0}]*'.format(string.punctuation), '') .str.strip() .str.split())], :]
source share