Using Nokogiri and XPath to Get Nodes with Multiple Attributes

I am trying to use Nokogiri to parse an HTML file with some eccentric markup. In particular, I am trying to capture divs that have both identifiers and several classes and styles. The markup looks something like this:

<div id="foo"> <div id="bar" class="baz bang" style="display: block;"> <h2>title</h2> <dl> List of stuff </dl> </div> </div> 

I am trying to capture the <dl> that is inside the div problem. I can get divs with a single id attribute without problems, but I cannot figure out how to get Nokogiri to grab divs with identifiers and classes. So they work great:

 content = @doc.xpath("//div[id='foo']") content = @doc.css('div#foo') 

But this does not return anything:

 content = @doc.xpath("//div[id='bar']") content = @doc.xpath("div#bar") 

Is there something obvious I'm missing here?

+6
ruby xpath nokogiri
source share
4 answers

I can get divs with one id attribute without problems, but I can’t figure out a way to get Nokigiri to grab divs with identifiers and classes.

Do you want :

 //div[id='bar' and class='baz bang' and style='display: block;'] 
+4
source share

I think that content = @doc.xpath("div#bar") is a typo and should be content = @doc.css("div#bar") or better content = @doc.css("#bar") . The first expression in your second code snippet looks fine.

+1
source share

The following works for me.

 require 'rubygems' require 'nokogiri' html = %{ <div id="foo"> <div id="bar" class="baz bang" style="display: block;"> <h2>title</h2> <dl> List of stuff </dl> </div> </div> } doc = Nokogiri::HTML.parse(html) content = doc .xpath("//div[@id='foo']/div[@id='bar' and @class='baz bang']/dl") .inner_html puts content 
+1
source share

You wrote:

I am trying to capture divs that are idi, several classes and styles defined

AND

I am trying to capture the <dl> which is inside the div problem

So this XPath 1.0:

 //div[@id][contains(normalize-space(@class),' ')][@style]/dl 
+1
source share

All Articles