Getting DOM from a page using Chromium / WebKit

Trying to access the DOM page after rendering. I do not need to view the page and plan to use it programmatically without any graphical interface or interaction.

The reason I am interested in post rendering is because I want to know where the objects appear. Some location information is encoded in HTML (e.g. via offsetLeft), but a lot is wrong. In addition, Javascript can change the final positioning. I want the positions as close as possible to what the user will see as much as possible.

I looked over the Chromium code and I think there is a way to do this, but there is not enough documentation to get started.

Introducing this VERY simply would interest me with a pseudocode like this:

DOMRoot *r = new Page("http://stackoverflow.com")->getDom(); 

Any tips on sending points?

+4
source share
1 answer

You should use the Web API shell provided by Chromium; in particular, the WebDocument class contains the necessary functionality. You can call it like this:

 WebFrame * mainFrame = webView->mainFrame(); WebDocument document = mainFrame->document(); WebElement docElement = document->docElement(); // Manipulate the DOM here using docElement ... 

Here you can view the source code of the Chromium Web API shell. Although there is not much documentation, the header files are reasonably well commented, and you can view the Chrome source code to see the API in action.

It's hard to start using Chromium. I recommend watching the test_shell application . In addition, a structure like the Chromium Embedded Framework (CEF) simplifies the process of incorporating Chromium into your application; I use CEF in my current project, and I am very pleased with it.

+5
source

Source: https://habr.com/ru/post/1315561/


All Articles