Python RSS Parser that also handles FeedBurner

I was in the middle of writing a Python script parser for RSS feeds. I use feedparser, however, I'm stuck with parsing feeds from FeedBurner. Who needs FeedBurner right now? Anyway..

For example, I could not find ways to parse

http://feeds.wired.com/wired/index

http://feeds2.feedburner.com/ziffdavis/pcmag

When I put them in the feedparser library, it doesn't seem to work. Tried to put it? Fmt = xml or? Format = xml at the end of the url, but still not in the xml format.

Do I need to use an html parser like BeautifulSoup to parse FeedBurner feeds? Preferably, is there a public parser or python script aggregator that handles this already?

Any advice or help is appreciated.

+4
source share
2 answers

Perhaps you have a problem with the version or you are using the API incorrectly - this will help to see your error message. For example, the following works with Python 2.7 and feedparser 5.0.1:

>>> import feedparser >>> url = 'http://feeds2.feedburner.com/ziffdavis/pcmag' >>> d = feedparser.parse(url) >>> d.feed.title u'PCMag.com: New Product Reviews' >>> d.feed.link u'http://www.pcmag.com' >>> d.feed.subtitle u"First Look At New Products From PCMag.com including Lab Tests, Ratings, Editor and User Reviews." >>> len(d['entries']) 30 >>> d['entries'][0]['title'] u'Canon Color imageClass MF9280cdn' 

And with a different URL:

 >>> url = 'http://feeds.wired.com/wired/index' >>> d = feedparser.parse(url) >>> d.feed.title u'Wired Top Stories' >>> d.feed.link u'http://www.wired.com/rss/index.xml' >>> d.feed.subtitle u'Top Stories<img src="http://www.wired.com/rss_views/index.gif" />' >>> len(d['entries']) 30 >>> d['entries'][0]['title'] u'Heart of Dorkness: LARPing Goes Haywire in <em>Wild Hunt</em>' 
+4
source

I know this question is very old, but I believe that it would be useful for everyone who has been to it, looking for a solution for parsing feedburner RSS feeds to insert the simple code that I have to get the latest entry from Cracked.com feedburner. I tested it on several other sites and it works great.

 def GetRSS('RSSurl'): url_info = urllib.urlopen(RSSurl) if (url_info): xmldoc = minidom.parse(url_info) if (xmldoc): url = xmldoc.getElementsByTagName('link').firstChild.data title = xmldoc.getElementsByTagName('title').firstChild.data print url, print title 

Just replace RSSurl with any feedburner page URL. In addition, as you can see, if there are any other elements that you want, you can add just add an additional getElementsByTagName line there, so that you would like to receive.

Edit: also, as far as I know, it will work with almost any RSS feed.

+2
source

All Articles