How does Nokogiri handle closed HTML tags, for example?

Question

How does Nokogiri handle closed HTML tags, for example?

When parsing an HTML document, how does Nokogiri handle tags  ? Suppose we have a document similar to this:

<div>
   Hi <br>
   How are you? <br>
</div>

Do you know Nokogiri that tags  are something special, not just regular XML tags and ignore them when parsing a node feed? I think Nokogiri is smart, but I want to make sure before I accept this project, which includes a cleanup site written in HTML4. You know what I mean ( How are you?not the content of the first  , as it would be in XML).

+5

ruby nokogiri

Kreeki Aug 19 '11 at 14:11

source share

3 answers

, Nokogiri () XML:

require 'nokogiri'
doc = Nokogiri::XML("<div>Hello<br>World</div>")
puts doc.root
#=> <div>Hello<br>World</br></div>

Nokogiri HTML:

require 'nokogiri'
doc = Nokogiri::HTML("<div>Hello<br>World</div>")
puts doc.root
#=> <html><body><div>Hello<br>World</div></body></html>

p doc.at('div').text
#=> "HelloWorld"

, "- " , , . A   - , Nokogiri , .

, , :

doc.css('br').each{ |br| br.replace("\n") }
p doc.at('div').text
#=> "Hello\nWorld"

, :

doc.css('br').each{ |br| br.replace(" ") }
p doc.at('div').text
#=> "Hello World"

+3

Phrogz 19 . '11 14:48

As far as I remember, starting with some HTML analysis last year, he will consider them as separate.

EDIT: My bad one, I just got someone to send me the code and tested it, we ended up with something like  separately.

-1

Nicholas smith Aug 19 '11 at 14:18

source share

Sébastien Le Callonnec · Accepted Answer · 2011-08-19T14:31:29+0000

HTML, , , XML. HTML- Nokogiri , :

require 'nokogiri'

doc = Nokogiri::HTML(<<-EOS
<div>
   Hi <br>
   How are you? <br>
</div>
EOS
)

doc.xpath("//br").each{ |e| puts e }

<br>
<br>

Nokogiri -, .

How does Nokogiri handle closed HTML tags, for example?

More articles: