Use javascript to get raw html code

I need to get the actual html code of an element in a webpage.

For example, if the actual html code is inside the "How to fix" element

Running this javascript getElementById ('myE'). innerHTML gives me "How to fix" , which is a decoded form

How can I get "How to fix" using javascript?

+8
javascript html
Oct 11 '10 at 10:08
source share
2 answers

What you should work with:

Check item:

 <div id="myE">How to&nbsp;fix</div>​ 

JavaScript test:

 alert(document.getElementById("myE​​​​​​​​").innerHTML); //alerts "How to&nbsp;fix" 

You can try it here . Make sure that wherever you use the result, &nbsp; like a space, which is likely to be so. If you want to show it somewhere that is intended for HTML, you need to avoid it.

+4
Oct 11 '10 at
source share

You cannot get the actual HTML source for parts of your web page.

When you provide a web browser with an HTML page, it parses HTML for some DOM nodes, which are the final version of your browser document. The DOM stores significant HTML-like information that used the Unicode character U + 00A0 without interruption before the fix word, but not the irrelevant information that you used with an entity reference, and not just typing it raw (   )

When you request a browser for a node element innerHTML , it does not give you the original HTML source code that was parsed to create this node because it no longer has this information. Instead, it generates new HTML from data stored in the DOM. The browser decides how to format this HTML serialization; different browsers create different HTML files, and most likely this will not be the way you originally formatted it.

In particular,

  • element names can be at the top or bottom;

  • attributes may not be in the order you specified them in the HTML;

  • Assigning an attribute may not be the same as in your source. IE often generates unquoted attributes that are not even valid HTML; all that you can be sure that the generated innerHTML will be safe to use in one browser by writing it to another innerHTML element;

  • he cannot use references to objects for anything except characters that otherwise could not have been included directly in the text content: ampersands, less-thans and quotation-value attributes. Instead of returning &nbsp; he can just give you a raw character   .

You may not be able to see that it is inextricable space, but it is still one, and if you embed this HTML in another element, it will act as one. You don’t need to rely anywhere on a non-expanding whitespace that is an entity escaped before &nbsp; ... if you do this, for some reason you can get this:

 x= el.innerHTML.replace(/\xA0/g, '&nbsp;') 

but it only eludes U + 00A0, not thousands of other possible Unicode characters, so this is a bit dubious.

If you really need to get the HTML source of your page, you can make XMLHttpRequest your own URL ( location.href ) and get the full, unverified HTML source in responseText . Almost never had a good reason for this.

+17
Oct 11 '10 at 10:50
source share



All Articles