Ruby Mechanize gets elements with the given text.

I am trying to analyze the contents of a website using mechanization, and I am stuck at a certain point. The content I want to parse is inside the li tag and is not always in the same order.

Suppose we have the following, where the order of the li tags is not always the same and several times some may not even be at all.

 <div class="details"> <ul> <li><span>title 1</span> ": here are the details"</li> <li><span>title 2</span> ": here are the details"</li> <li><span>title 3</span> ": here are the details"</li> <li><span>title 4</span> ": here are the details"</li> </ul> </div> 

I want to get only those li details where the span text is, for example, title 3 . I did the following, which gives me information from the first li :

 puts page.at('.details').at('span', :text => "title 3").at("+ *").text 

Is there a way to do what I want to use mechanize, or use other tools?

+7
css ruby mechanize
source share
4 answers
 page.search(".details").at("span:contains('title 3')").parent.text 

Explanation: Using c, you can use the css or xpath selector. To make it more understandable and similar to your approach, this answer uses a css selector, but the problem is that CSS cannot make text-based selections. Thanks to Nokogiri, you can use the jQuery selector, so it contains methods.

The selection gets the span element, so if you want to get the parent li element, you can use the parent methods and then easily get the text.

+16
source share

Since you want to do this using Mechanize (and I see that one of the comments recommends using Nokogiri), you should know that Mechanize is built on Nokogiri, so you can really use any / all Nokogiri functionality through Mechanize.

To show you documents from http://mechanize.rubyforge.org/Mechanize.html

Mechanize.html_parser = Nokogiri :: XML

So, you can accomplish this using the XPath method and the machize page.search method.

page.search ("// div [@ class = 'details'] / ul / li [span = 'title 3']"). text

This should give you the text for the li element you are looking for. (untested with .text, but XPath works)

Here you can test XPath: http://www.xpathtester.com/saved/51c5142c-dbef-4206-8fbc-1ba567373fb2

+2
source share

A cleaner css approach:

 page.at('.details li:has(span[text()="title 3"])') 
+1
source share

According to the comment, I think you're looking for something like below.

As I said, the problem is that it gives me the first li, while I want the one that has the text name 3

 require 'nokogiri' doc = Nokogiri::HTML.parse <<-eotl <div class="details"> <ul> <li><span>title 1</span> ": here are the details"</li> <li><span>title 2</span> ": here are the details"</li> <li><span>title 3</span> ": here are the details"</li> <li><span>title 4</span> ": here are the details"</li> </ul> </div> eotl node = doc.at_xpath("//div[@class='details']//span[contains(.,'title 3')]/..") node.name # => "li" puts node.to_html # <li> # <span>title 3</span> ": here are the details"</li> puts node.children #<span>title 3</span> # ": here are the details" 
0
source share

All Articles