Using Nokogiri and XPath to Get Nodes with Multiple Attributes

Question

Using Nokogiri and XPath to Get Nodes with Multiple Attributes

I am trying to use Nokogiri to parse an HTML file with some eccentric markup. In particular, I am trying to capture divs that have both identifiers and several classes and styles. The markup looks something like this:

<div id="foo"> <div id="bar" class="baz bang" style="display: block;"> <h2>title</h2> <dl> List of stuff </dl> </div> </div>

I am trying to capture the <dl> that is inside the div problem. I can get divs with a single id attribute without problems, but I cannot figure out how to get Nokogiri to grab divs with identifiers and classes. So they work great:

 content = @doc.xpath("//div[id='foo']") content = @doc.css('div#foo')

But this does not return anything:

 content = @doc.xpath("//div[id='bar']") content = @doc.xpath("div#bar")

Is there something obvious I'm missing here?

+6

ruby xpath nokogiri

Timd Aug 29 '10 at 1:51

source share

4 answers

I think that content = @doc.xpath("div#bar") is a typo and should be content = @doc.css("div#bar") or better content = @doc.css("#bar") . The first expression in your second code snippet looks fine.

+1

Daniel O'Hara Aug 29 '10 at 2:56

source share

The following works for me.

 require 'rubygems' require 'nokogiri' html = %{ <div id="foo"> <div id="bar" class="baz bang" style="display: block;"> <h2>title</h2> <dl> List of stuff </dl> </div> </div> } doc = Nokogiri::HTML.parse(html) content = doc .xpath("//div[@id='foo']/div[@id='bar' and @class='baz bang']/dl") .inner_html puts content

+1

AboutRuby Aug 29 '10 at 6:59

source share

You wrote:

I am trying to capture divs that are idi, several classes and styles defined

AND

I am trying to capture the <dl> which is inside the div problem

So this XPath 1.0:

 //div[@id][contains(normalize-space(@class),' ')][@style]/dl

+1

user357812 Aug 30 '10 at 13:42

source share

Dimitre novatchev · Accepted Answer · 2010-08-29T03:44:29+0000

I can get divs with one id attribute without problems, but I can’t figure out a way to get Nokigiri to grab divs with identifiers and classes.

Do you want :

 //div[id='bar' and class='baz bang' and style='display: block;']

Using Nokogiri and XPath to Get Nodes with Multiple Attributes

More articles: