Feedparser and Google News

Question

Feedparser and Google News

I am trying to download a news compilation (to try to process a natural language) from Google News using a generic feedparser with python. I don’t know anything about XML, I just use the feedparser example. The problem is that I can’t find in the dict that I get from the RSS feed, the news content is just the name.

I am currently trying to use this code:

import feedparser url = 'http://news.google.com.br/news?pz=1&cf=all&ned=us&hl=en&output=rss' # just some GNews feed - I'll use a specific search later feed = feedparser.parse(url) for post in feed.entries: print post.title print post.keys()

The keys that I get in this message are just the title, summary, date, etc., there is no content.

Is this some kind of problem with Google News, or am I doing something wrong? Is there any way to do this?

+4

python rss feedparser google-news

Rafael S. Calsaverini Nov 04 '09 at 2:41

source share

2 answers

First you need to check the RSS specification . And here is the parser . That should get you started.

+1

David basarab Nov 04 '09 at 2:46

source share

Bartek · Accepted Answer · 2009-11-04T02:50:01+0000

Have you watched a feed from Google News?

Each feed has a root element that contains a bunch of information and actual dict entries. Here is a dirty way to see what is available:

 import feedparser d = feedparser.parse('http://news.google.com/news?pz=1&cf=all&ned=ca&hl=en&topic=w&output=rss') print [field for field in d]

From what we see, we have an entries field, which most likely contains .. news entries! If you:

 import pprint pprint.pprint(entry for entry in d['entries'])

We will get additional information :) This will show you all the fields associated with each entry, in fairly printed form (what pprint is for)

So, to get all our news headlines from this feed:

 titles = [entry.title for entry in d['entries']

So play with that. Hope it's helpful to get started

Feedparser and Google News

More articles: