Less painful way to parse RSS feed using lxml?

Question

Less painful way to parse RSS feed using lxml?

I need to show RSS feeds with Python, Atom for the most part. Based on PHP, where I could quickly get values using $ entry-> link, I find lxml more accurate, faster, albeit complex. After several hours of research, I got this work with arteshnik tape:

def GetRSSFeed(url): out = [] feed = urllib.urlopen(url) feed = etree.parse(feed) feed = feed.getroot() for element in feed.iterfind(".//item"): meta = element.getchildren() title = meta[0].text link = meta[1].text for subel in element.iterfind(".//description"): desc = subel.text entry = [title,link,desc] out.append(entry) return out

Can this be made easier? How can I access tags directly? Feedparser does a single line of code! Why?

+4

python django atom-feed lxml feedparser

reinhardt Jun 22 '12 at 14:11

source share

2 answers

guyrt · Answer 1 · 2012-06-22T14:37:30+0000

Take a look at the feedparser library. This gives you a well formatted RSS object.

 > import feedparser > feed = feedparser.parse('http://feeds.marketwatch.com/marketwatch/marketpulse/') > print feed.keys() ['feed', 'status', 'updated', 'updated_parsed', 'encoding', 'bozo', 'headers', 'etag', 'href', 'version', 'entries', 'namespaces'] > len(feed.entries) 30

rubayeet · Answer 2 · 2013-06-19T06:32:01+0000

You can try speedparser , implementation of Universal Parser with lxml . Nonetheless in beta.

Less painful way to parse RSS feed using lxml?

More articles: