Parsing the DOM in JavaScript

Question

Parsing the DOM in JavaScript

A bit of background:
I am developing a web application for mobile applications using JavaScript. HTML rendering is based on Safari. Cross-domain policy is disabled, so I can make calls to other domains using XmlHttpRequests. The idea is to parse external HTML and get the text content of a specific element.
I used to parse text line by line, find the line I needed. Then get the contents of the tag, which is a substring of this string. This is very troublesome and requires a lot of maintenance every time the target HTML changes.
So now I want to parse the html text in the DOM and run css or xpath requests on it.
This works well:

$('<div></div>').append(htmlBody).find('#theElementToFind').text()

The only problem is that when I use the browser to load HTML text into the DOM element, it will try to load all external resources (images, js files, etc.). Although this does not cause serious problems, I would like to avoid this.

Now the question is:
How can I parse html text in the DOM without loading the browser with external resources or running js scripts?
Some ideas I was thinking about:

creating a new document object using a call to createDocument ( document.implementation.createDocument() ), but I'm not sure if it will skip loading external resources.
using a third-party DOM parser in JS - the only thing I tried was very bad with processing errors.
use an iframe to create a new document so that external resources with a relative path will not cause an error in the console.

+4

javascript dom xmlhttprequest cross-domain innerhtml

m_vitaly Aug 15 '12 at 9:30

source share

2 answers

You can build a jQuery object of any html line without adding it to the DOM:

 $(htmlBody).find('#theElementToFind').text();

+1

Maxim Krizhanovsky Aug 15 '12 at 9:34

source share

m_vitaly · Accepted Answer · 2012-08-15T11:49:54+0000

The following code seems to work just fine:

 var doc = document.implementation.createHTMLDocument(""); doc.documentElement.innerHTML = htmlBody; var text = $(doc).find('#theElementToFind').text();

external resources are not loaded, scripts are not evaluated.

Found it here: fooobar.com/questions/7323 / ...

Origin: https://developer.mozilla.org/en/DOMParser#DOMParser_HTML_extension_for_other_browsers

Parsing the DOM in JavaScript

More articles: