My goal is to find out if there are certain combinations of keywords that may be present in a column filled with text lines (headings of news articles). Then I want to plot the frequency in the histogram.
I did the following using the pandas data frame:
pvv_news = df[df['desc'].str.contains("pvv", case=True)]
pvv_month = win.groupby(win.index.month).size()
pvv_month.index = ['January', 'February', 'March', 'April', 'May', 'June']
pvv_month.plot(kind='bar')
What gives:

Now I canβt understand how to make AND and OR combinations to get more specific results. An example of what I mean, but that does not work:
pvv_news = df[df['desc'].str.contains("(pvv)&(nederland|overheid)", case=True)]
I looked at the following functions, but I can not understand:
- pandas.Series.str.extract
- pandas.Series.str.match
- pandas.Series.str.contains
- Regular expressions in combination with the above functions.
source
share