A bit of background:
I am developing a web application for mobile applications using JavaScript. HTML rendering is based on Safari. Cross-domain policy is disabled, so I can make calls to other domains using XmlHttpRequests. The idea is to parse external HTML and get the text content of a specific element.
I used to parse text line by line, find the line I needed. Then get the contents of the tag, which is a substring of this string. This is very troublesome and requires a lot of maintenance every time the target HTML changes.
So now I want to parse the html text in the DOM and run css or xpath requests on it.
This works well:
$('<div></div>').append(htmlBody).find('#theElementToFind').text()
The only problem is that when I use the browser to load HTML text into the DOM element, it will try to load all external resources (images, js files, etc.). Although this does not cause serious problems, I would like to avoid this.
Now the question is:
How can I parse html text in the DOM without loading the browser with external resources or running js scripts?
Some ideas I was thinking about:
- creating a new document object using a call to createDocument (
document.implementation.createDocument()
), but I'm not sure if it will skip loading external resources. - using a third-party DOM parser in JS - the only thing I tried was very bad with processing errors.
- use an iframe to create a new document so that external resources with a relative path will not cause an error in the console.
source share