How to parse a page using html5ever, change the DOM and serialize it?

I would like to parse a webpage, insert bindings at specific positions, and display the modified DOM again to generate docsets for Dash . Is it possible?

From the examples included in html5ever, I see how to read an HTML file and output the HTML code of a bad person, but I donโ€™t understand how I can change the RcDom object that I received.

I would like to see a fragment inserting an anchor element ( <a name="foo"></a> ) in RcDom .

Note. This is a question about Rust and html5ever specifically ... I know how to do this in other languages โ€‹โ€‹or simpler HTML parsers.

+6
source share
1 answer

Here is some code that parses the document, adds achore to the link, and prints a new document:

 extern crate html5ever; use html5ever::{ParseOpts, parse_document}; use html5ever::tree_builder::TreeBuilderOpts; use html5ever::rcdom::RcDom; use html5ever::rcdom::NodeEnum::Element; use html5ever::serialize::{SerializeOpts, serialize}; use html5ever::tendril::TendrilSink; fn main() { let opts = ParseOpts { tree_builder: TreeBuilderOpts { drop_doctype: true, ..Default::default() }, ..Default::default() }; let data = "<!DOCTYPE html><html><body><a href=\"foo\"></a></body></html>".to_string(); let dom = parse_document(RcDom::default(), opts) .from_utf8() .read_from(&mut data.as_bytes()) .unwrap(); let document = dom.document.borrow(); let html = document.children[0].borrow(); let body = html.children[1].borrow(); // Implicit head element at children[0]. { let mut a = body.children[0].borrow_mut(); if let Element(_, _, ref mut attributes) = a.node { attributes[0].value.push_tendril(&From::from("#anchor")); } } let mut bytes = vec![]; serialize(&mut bytes, &dom.document, SerializeOpts::default()).unwrap(); let result = String::from_utf8(bytes).unwrap(); println!("{}", result); } 

The following is printed:

 <html><head></head><body><a href="foo#anchor"></a></body></html> 

As you can see, we can navigate through the child nodes using the children attribute.

And we can change the attribute present in the Element attribute vector.

+6
source

All Articles