How to extract fragment between bodies body (<body> ... </body>) from AJAX response in JavaScript

AJAX response returns the full HTML page. I need to extract the fragment between the body tags ( <body> and </body> ). This must be done on the client side using JavaScript. Any help would be appreciated.

+4
source share
3 answers

The easiest but worst way is to simply hack lines in the response text.

 var bodyhtml= html.split('<body>').pop().split('</body>')[0]; 

This is unsatisfactory in the general case, but it can be feasible if you know the exact format of the returned HTML (for example, there are no attributes in <body> , that the sequences <body> and </body> n’t used in the comment in the middle of the page, etc. .d.).

Another very bad way is to write the entire document in the innerHTML newly created <div> and catch the necessary elements without worrying about writing <html> or <body> inside the <div> broken. You cannot reliably separate child <head> elements from those that are in <body> in this way, but this is what jQuery does.

A more reliable but more painful way would be to use a separate HTML document:

 var iframe= document.createElement('iframe'); iframe.style.display= 'none'; document.body.insertBefore(iframe, document.body.firstChild); var idoc= 'contentDocument' in iframe? iframe.contentDocument : iframe.contentWindow.document; idoc.write(htmlpage); idoc.close(); alert(idoc.body.innerHTML); document.body.removeChild(iframe); 

although it would also execute all the scripts inside the document, potentially changing it so that it would also be inconvenient.

+4
source

If your HTML page is on the Internet, you can use YQL.

For example, if your page URL is http://xyz.com/page.html and you want everything in the body element to do this

 select * from html where url="http://xyz.com/page.html" and xpath='//body' 

If you are new to YQL, read this http://en.wikipedia.org/wiki/YQL_Page_Scraping

There is also an easy way to do this with the Chromyqlip extension https://chrome.google.com/extensions/detail/bkmllkjbfbeephbldeflbnpclgfbjfmn

Hope this helps you !!!

0
source
 // Get the XML object for the "body" tag from the XMLHttpRequest/ActiveXObject // object (requestObj). // NOTE: This assumes there is only one "body" tag in your HTML document. var body = requestObj.responseXML.getElementsByTagName("body")[0]; // Get the "body" tag as an XML string. var bodyXML; // for Internet Explorer if (body.xml) { bodyXML = body.xml; } // for every other browser if (typeof (XMLSerializer) != "undefined") { var serializer = new XMLSerializer(); bodyXML = serializer.serializeToString(body); } 

This gives you the XML for the body tag, like a string. Unfortunately, it still includes "<body>" and "</body>", so if you only want the contents of the tag, you will have to disable them.

You might want to take a look at the second example (“Sample HTML 2 Code”) on this page .

0
source

All Articles