Parsing the DOM in JavaScript

A bit of background:
I am developing a web application for mobile applications using JavaScript. HTML rendering is based on Safari. Cross-domain policy is disabled, so I can make calls to other domains using XmlHttpRequests. The idea is to parse external HTML and get the text content of a specific element.
I used to parse text line by line, find the line I needed. Then get the contents of the tag, which is a substring of this string. This is very troublesome and requires a lot of maintenance every time the target HTML changes.
So now I want to parse the html text in the DOM and run css or xpath requests on it.
This works well:

$('<div></div>').append(htmlBody).find('#theElementToFind').text() 

The only problem is that when I use the browser to load HTML text into the DOM element, it will try to load all external resources (images, js files, etc.). Although this does not cause serious problems, I would like to avoid this.

Now the question is:
How can I parse html text in the DOM without loading the browser with external resources or running js scripts?
Some ideas I was thinking about:

  • creating a new document object using a call to createDocument ( document.implementation.createDocument() ), but I'm not sure if it will skip loading external resources.
  • using a third-party DOM parser in JS - the only thing I tried was very bad with processing errors.
  • use an iframe to create a new document so that external resources with a relative path will not cause an error in the console.
+4
source share
2 answers

The following code seems to work just fine:

 var doc = document.implementation.createHTMLDocument(""); doc.documentElement.innerHTML = htmlBody; var text = $(doc).find('#theElementToFind').text(); 

external resources are not loaded, scripts are not evaluated.

Found it here: fooobar.com/questions/7323 / ...

Origin: https://developer.mozilla.org/en/DOMParser#DOMParser_HTML_extension_for_other_browsers

+4
source

You can build a jQuery object of any html line without adding it to the DOM:

 $(htmlBody).find('#theElementToFind').text(); 
+1
source

All Articles