Parsing a colon RSS item in a tag with Ruby?

I am trying to parse information from an RSS feed having this tag structure:

<dc:subject>foo bar</dc:subject> 

using the built-in Ruby RSS library. Obviously, executing item.dc:subject is causing errors, but I cannot find a way to get this information. Is there any way to make this work? Or is this possible with another RSS library?

+8
ruby parsing rss
source share
3 answers

Tags with ':' in them are really XML tags with a namespace. I have never had good results using the RSS module, because the feed formats often do not meet the specifications, and as a result the module refuses. I highly recommend using Nokogiri to analyze your feed, be it RDF, RSS or ATOM.

Nokogiri has the ability to use XPath accessors or CSS assemblers, as well as support a namespace. The last two lines will be equivalent:

 require 'nokogiri' require 'open-uri' doc = Nokogiri::XML(open('http://somehost.com/rss_feed')) doc.at('//dc:subject').text doc.at('dc|subject').text 

When working with namespaces, you need to add an declaration in Access XP:

 doc.at('//dc:subject', 'dc' => 'link to dc declaration') 

See the "Namespaces" section for more details .

Without a URL or a better sample, I cannot do more, but that should make you point in the best direction.

After a couple of years, I wrote a large RSS aggregator for my work using Nokogiri, which handled RDF, RSS, and ATOM. The Ruby RSS library wasn't up to the task, but Nokogiri was awesome.

If you don’t want to roll back, Paul Dix Feedzirra is a good feed processing stone.

+6
source share

The RSS module appears to be able to fulfill these attributes of the XML namespace, i.e. <dc:date> as follows:

feed.items.each do |item| puts "Date: #{item.dc_date}" end

+1
source share

I think item['dc:subject'] might work.

-one
source share

All Articles