What is the best practice for parsing remote content using jQuery?

After calling jQuery ajax to retrieve the entire XHTML document, what is the best way to select specific elements from the resulting string? Perhaps there is a library or plugin that solves this problem?

jQuery can only select XHTML elements that exist in a string, if they are normally allowed in a div in the W3C specification; so I'm interested in choosing things like <title> , <script> and <style> .

According to jQuery documentation:

http://docs.jquery.com/Core/jQuery#htmlownerDocument

An HTML string cannot contain elements that are not valid in a div, such as html, head, body, or header elements.

Therefore, since we have established that jQuery does not provide a way to do this, how would I select these elements? As an example, if you can show me how to choose a remote page title, this will be perfect!

Thanks Pete

+34
jquery html-parsing
Jun 23 '09 at 20:10
source share
10 answers

Instead of hacking jQuery for this, I suggest you give up jQuery for a minute and use raw XML methods. Using the XML Dom methods, you can do this:

  window.onload = function(){ $.ajax({ type: 'GET', url: 'text.html', dataType: 'html', success: function(data) { //cross platform xml object creation from w3schools try //Internet Explorer { xmlDoc=new ActiveXObject("Microsoft.XMLDOM"); xmlDoc.async="false"; xmlDoc.loadXML(data); } catch(e) { try // Firefox, Mozilla, Opera, etc. { parser=new DOMParser(); xmlDoc=parser.parseFromString(data,"text/xml"); } catch(e) { alert(e.message); return; } } alert(xmlDoc.getElementsByTagName("title")[0].childNodes[0].nodeValue); } }); } 

Do not interfere with iframe, etc.

+28
Jul 01 '09 at 15:52
source share

Just the idea tested in FF / Safari seems to work if you create an iframe to store the document temporarily. Of course, if you do this, it might be wiser to use the src iframe property to load the document and do whatever you want in the "onload" of this.

  $(function() { $.ajax({ type: 'GET', url: 'result.html', dataType: 'html', success: function(data) { var $frame = $("<iframe src='about:blank'/>").hide(); $frame.appendTo('body'); var doc = $frame.get(0).contentWindow.document; doc.write(data); var $title = $("title", doc); alert('Title: '+$title.text() ); $frame.remove(); } }); }); 

I had to add an iframe to the body to get it .contentWindow.

+5
Jun 26 '09 at 23:38
source share

Inspired by this answer , but delayed:

 function fetchDoc(url) { var dfd; dfd = $.Deferred(); $.get(url).done(function (data, textStatus, jqXHR) { var $iframe = $('<iframe style="display:none;"/>').appendTo('body'); var $doc = $iframe.contents(); var doc = $doc[0]; $iframe.load(function() { dfd.resolveWith(doc, [data, textStatus, jqXHR]); return $iframe.remove(); }); doc.open(); doc.write(data); return doc.close(); }).fail(dfd.reject); return dfd.promise(); }; 

And smoke it with:

 fetchDoc('/foo.html').done(function (data, textStatus, jqXHR) { alert($('title', this).text()); }); 

LIVE DEMO (click "Run")

+3
Jul 08 2018-12-12T00:
source share

How about fast tag renaming?

 $.ajax({ type : "GET", url : 'results.html', dataType : "html", success: function(data) { data = data.replace(/html/g, "xhtmlx"); data = data.replace(/head/g, "xheadx"); data = data.replace(/title/g, "xtitlex"); data = data.replace(/body/g, "xbodyx"); alert($(data).find("xtitlex").text()); } }); 
+2
Jun 26 '09 at 14:24
source share

It works. I just split the building blocks for better readability.

Check the explanations and inline comments to understand how this works and why it should be done as follows.

Of course, this cannot be used to obtain cross-domain content, for which you either need to proxy calls using a script, or think about integration, for example, flXHR (cross-domain Ajax with Flash)

call.html

 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <meta http-equiv="content-type" content="text/html; charset=utf-8" /> <title>asd</title> <script src="jquery.js" type="text/javascript"></script> <script src="xmlDoc.js" type="text/javascript"></script> <script src="output.js" type="text/javascript"></script> <script src="ready.js" type="text/javascript"></script> </head> <body> <div> <input type="button" id="getit" value="GetIt" /> </div> </body> </html> 

jquery.js (jQuery 1.3.2 uncompressed) test.html is a valid XHTML document

xmlDoc.js

 // helper function to create XMLDocument out of a string jQuery.createXMLDocument = function( s ) { var xmlDoc; // is it a IE? if ( window.ActiveXObject ) { xmlDoc = new ActiveXObject('Microsoft.XMLDOM'); xmlDoc.async = "false"; // prevent erros as IE tries to resolve the URL in the DOCTYPE xmlDoc.resolveExternals = false; xmlDoc.validateOnParse = false; xmlDoc.loadXML(s); } else { // non IE. give me DOMParser // theoretically this else branch should never be called // but just in case. xmlDoc = ( new DOMParser() ).parseFromString( s, "text/xml" ); } return xmlDoc; }; 

output.js

 // Output the title of the loaded page // And get the script-tags and output either the // src attribute or code function headerData(data) { // give me the head element var x = jQuery("head", data).eq(0); // output title alert(jQuery("title", x).eq(0).text()); // for all scripttags which include a file out put src jQuery("script[src]", x).each(function(index) { alert((index+1)+" "+jQuery.attr(this, 'src')); }); // for all scripttags which are inline javascript output code jQuery("script:not([src])", x).each(function(index) { alert(this.text); }); } 

ready.js

 $(document).ready(function() { $('#getit').click(function() { $.ajax({ type : "GET", url : 'test.html', dataType : "xml", // overwrite content-type returned by server to ensure // the response getst treated as xml beforeSend: function(xhr) { // IE doesn't support this so check before using if (xhr.overrideMimeType) { xhr.overrideMimeType('text/xml'); } }, success: function(data) { headerData(data); }, error : function(xhr, textStatus, errorThrown) { // if loading the response as xml failed try it manually // in theory this should only happen for IE // maybe some if (textStatus == 'parsererror') { var xmlDoc = jQuery.createXMLDocument(xhr.responseText); headerData(xmlDoc); } else { alert("Failed: " + textStatus + " " + errorThrown); } } }); }); }); 

In Opera, everything works without the createXMLDocument and beforeSend .

Additional complexity is required for Firefox (3.0.11) and IE6 (cannot test IE7, IE8, other browsers), because they have a problem when the server returns the Content-Type: does not indicate that it is xml, My web server returned Content-Type: text/html; charset=UTF-8 Content-Type: text/html; charset=UTF-8 for test.html. . In these two browsers, jQuery called the error textStatus with textStatus , saying parsererror . Because on line 3706 in jQuery.js

 data = xml ? xhr.responseXML : xhr.responseText; 

data set to null. As in FF and IE, the value of xhr.responseXML is null. This is because they do not get the returned data is xml (like Opera). And only xhr.responseText installed with all xhtml code. Because the data is null, line 3708

 if ( xml && data.documentElement.tagName == "parsererror" ) 

throws an exception, which is marked on line 3584, and the status is set to parsererror .

In FF, I can solve the problem with the overrideMimeType() function before sending the request.

But IE does not support this function in the XMLHttpRequest object, so I need to generate the XMLDocument itself if the error callback is executed and the error is parsererror .

example for test.html

 <?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <meta http-equiv="content-type" content="text/html; charset=utf-8" /> <title>Plugins | jQuery Plugins</title> <script type="text/javascript" src="jquery.js"></script> <script type="text/javascript">var imagePath = '/content/img/so/';</script> </head> <body> </body> </html> 
+2
Jun 28 '09 at 10:57
source share

Shamelessly copied and adapted from my other answer ( A simple jQuery ajax example that does not find elements in the returned HTML ), this selects the HTML of the remote page, then the parseHTML function creates a temporary div element for it and places it inside, passes through it and returns the requested element. JQuery then notifies the text () inside.

 $(document).ready(function(){ $('input').click(function(){ $.ajax({ type : "POST", url : 'ajaxtestload.html', dataType : "html", success: function(data) { alert( data ); // shows whole dom var gotcha = parseHTML(data, 'TITLE'); // nodeName property returns uppercase if (gotcha) { alert($(gotcha).html()); // returns null }else{ alert('Tag not found.'); } }, error : function() { alert("Sorry, The requested property could not be found."); } }); }); }); function parseHTML(html, tagName) { var root = document.createElement("div"); root.innerHTML = html; // Get all child nodes of root div var allChilds = root.childNodes; for (var i = 0; i < allChilds.length; i++) { if (allChilds[i].nodeName == tagName) { return allChilds[i]; } } return false; } 

To get a few elements or a list of script tags, let's say, I think, you will need to improve the parseHTML function, but hey proof of concept :-)

+1
Jun 24 '09 at 2:10
source share

If you want to find the value of specially named fields (i.e. inputs in the form), then something like this will find them for you:

 var fields = ["firstname","surname", ...."foo"]; function findFields(form, fields) { var form = $(form); fields.forEach(function(field) { var val = form.find("[name="+field+"]").val(); .... 
0
Jun 23 '09 at 20:58
source share

How about this: Load XML from a string

0
Jun 24 '09 at 1:40
source share

After parsing the XML string in the XML DOM I would either use jQuery on it directly (you can do this by providing context to the jQuery selector, for example $(':title', xdoc.rootElement) , or using XPath (works in Firefox, presumably libraries for IE, but I did not have a good success with them).

0
Jul 01 '09 at 19:35
source share
 $.get('yourpage.html',function(data){ var content = $('<div/>').append(data).find('#yourelement').html(); }); 

You can also just temporarily wrap inside a div. You do not even need to add it to the DOM.

0
Sep 17 '12 at 14:19
source share



All Articles