Nokogiri html parsing question

I find it difficult to understand why I canโ€™t get keywords for proper analysis through nokogiri. In the following example, I have href text link functionality, but can't figure out how to pull keywords.

This is the code that I still have:

.....

doc = Nokogiri::HTML(open("http://www.cnn.com")) doc.xpath('//a/@href').each do |node| #doc.xpath("//meta[@name='Keywords']").each do |node| puts node.text 

....

This successfully displays all the href text on the page, but when I try to use it for keywords, it shows nothing. I tried several options for this, no luck. I assume that the ".text" callout after node is incorrect, but I'm not sure.

I apologize for how rude this code is, I do my best to find out here.

+4
source share
1 answer

You are right, the problem is text . text returns the text between the opening tag and the closing tag. Since the meta tags are empty, this gives an empty string. Instead, you want to use the value of the "content" attribute.

 doc.xpath("//meta[@name='Keywords']/@content").each do |attr| puts attr.value end 

Since you know that there will be only one meta tag named โ€œkeywordsโ€, you actually do not need to scroll through the results, but it can wrap the first element as follows:

 puts doc.xpath("//meta[@name='Keywords']/@content").first.value 

Please note, however, that this will lead to an error if the meta tag does not contain the name โ€œcontentโ€, so the first option may be preferred.

+6
source

All Articles