How to analyze only a certain category of a website using a newspaper library?

Question

How to analyze only a certain category of a website using a newspaper library?

I am using Python3 and the newspaper library. It is said that this library can create a Source object, which is an abstraction of a news site. But what if I need only an abstraction of a certain category.

For example, when I use this URL , I want to get all the articles in the 'technology' category. Instead, I get articles from 'politics' .

I think that when creating the Source object, the newspaper uses only the domain name, which in my case is www.kyivpost.com ).

Is there a way to get it to work with URLs like http://www.kyivpost.com/technology/ ?

+7

python python-3.x parsing web-scraping python-newspaper

Andriy stolyar Jul 6 '16 at 13:42

source share

1 answer

Joe woods · Answer 1 · 2016-08-10T21:24:23+0000

newspaper will use the rss channel of the site, if available; KyivPost has only one RSS feed and publishes articles mainly on politics, so your result is mostly politics.

You may have more luck using BeautifulSoup to draw article URLs specifically from the technology page and send them directly to newspaper .

How to analyze only a certain category of a website using a newspaper library?

More articles: