Parse an HTML snippet by removing some custom tags

I am trying to parse an HTML fragment containing a custom HTML tag using Nokogiri.

Example:

string = "<div>hello</div>\n<custom-tag></custom-tag>"

I tried to load it in different ways, but none of them are optimal.

If I use Nokogiri :: HTML:

doc = Nokogiri::HTML(string)

When I use it to_html, it adds an doctypeand tag htmlthat wraps the content. This is undesirable.

If I use Nokogiri :: XML:

doc = Nokogiri::XML(string)

I got it Error at line 2: Extra content at the end of the document, because XML must have a root tag that wraps the entire contents of the document. If I try to save this content again, the output will be <div>hello</div>(each tag after the first is deleted)

I also tried doc = Nokogiri::HTML.fragment:

doc = Nokogiri::HTML.fragment(string)

But he complains about custom-tag.

Nokogiri HTML?

+4
1

doc = Nokogiri::HTML.fragment(string) - , doc.errors .

HTML-, , , HTML, , .

Nokogiri::XML.fragment, , . undefined.

+4

All Articles