Nokogiri html parsing question

Question

Nokogiri html parsing question

I find it difficult to understand why I can’t get keywords for proper analysis through nokogiri. In the following example, I have href text link functionality, but can't figure out how to pull keywords.

This is the code that I still have:

.....

doc = Nokogiri::HTML(open("http://www.cnn.com")) doc.xpath('//a/@href').each do |node| #doc.xpath("//meta[@name='Keywords']").each do |node| puts node.text

....

This successfully displays all the href text on the page, but when I try to use it for keywords, it shows nothing. I tried several options for this, no luck. I assume that the ".text" callout after node is incorrect, but I'm not sure.

I apologize for how rude this code is, I do my best to find out here.

+4

ruby nokogiri

paradoxic Aug 9 '10 at 16:47

source share

1 answer

sepp2k · Accepted Answer · 2010-08-09T16:56:34+0000

You are right, the problem is text . text returns the text between the opening tag and the closing tag. Since the meta tags are empty, this gives an empty string. Instead, you want to use the value of the "content" attribute.

 doc.xpath("//meta[@name='Keywords']/@content").each do |attr| puts attr.value end

Since you know that there will be only one meta tag named “keywords”, you actually do not need to scroll through the results, but it can wrap the first element as follows:

 puts doc.xpath("//meta[@name='Keywords']/@content").first.value

Please note, however, that this will lead to an error if the meta tag does not contain the name “content”, so the first option may be preferred.

Nokogiri html parsing question

More articles: