Less painful way to parse RSS feed using lxml?

I need to show RSS feeds with Python, Atom for the most part. Based on PHP, where I could quickly get values โ€‹โ€‹using $ entry-> link, I find lxml more accurate, faster, albeit complex. After several hours of research, I got this work with arteshnik tape:

def GetRSSFeed(url): out = [] feed = urllib.urlopen(url) feed = etree.parse(feed) feed = feed.getroot() for element in feed.iterfind(".//item"): meta = element.getchildren() title = meta[0].text link = meta[1].text for subel in element.iterfind(".//description"): desc = subel.text entry = [title,link,desc] out.append(entry) return out 

Can this be made easier? How can I access tags directly? Feedparser does a single line of code! Why?

+4
source share
2 answers

Take a look at the feedparser library. This gives you a well formatted RSS object.

 > import feedparser > feed = feedparser.parse('http://feeds.marketwatch.com/marketwatch/marketpulse/') > print feed.keys() ['feed', 'status', 'updated', 'updated_parsed', 'encoding', 'bozo', 'headers', 'etag', 'href', 'version', 'entries', 'namespaces'] > len(feed.entries) 30 
+9
source

You can try speedparser , implementation of Universal Parser with lxml . Nonetheless in beta.

+3
source

All Articles