... some content ...

JQuery: Parse / Manipulate HTML without scripting

I am uploading HTML code through Ajax in this format:

<div id="div1"> ... some content ... </div> <div id="div2"> ...some content... </div> ... etc. 

I need to iterate over each div in a response and process it separately. Having a separate line for the HTML content of each div mapped to an identifier satisfies my requirements. However, divs may contain script tags that I need to save but not execute (they will be executed later when I attach the HTML to the document, so execution during parsing will be bad). My first thought was to do something like this:

 // data being the result from $.get var clean = data.replace(/<script.*?</script>/,function() { // insert some unique token, save the tag, put it back while I'm processing }); $('<div/>').html(clean).children().each( /* ... process here ... */); 

But I'm worried that some stupid developer is going to come and put something like this in one of the divs:

 <script> var foo = '</script>'; // ... </script> 

That all this would die out. Not to mention that it all starts with hacking. Does anyone know a better way?

EDIT: Here is the solution I came up with:

 var divSplitRegex = /(?:^|<\/div>)\s*<div\s+id="prefix-(.+?)">/g, idReplacement = preDelimeter+'$1'+postDelimeter; var r = data.replace(<\/div>\s*$/,''). replace(divSplitRegex,idReplacement).split(preDelimeter); $.each(r,function() { var content; if(this) { callback.apply(null,this.split(postDelimeter)); } }); 

Where preDelimiter and postDelimeter are only unique lines, such as "### I need to be an idiot to insert this line into my content without saving because it will break all ###", and the callback is a function that expects div id and div. This only works because I know that divs will only have an id attribute, and the identifier will have a special prefix. I suppose someone can put a div in their content with an identifier having the same prefix, and he will blame things too.

So, I still don't like this solution. Does anyone have a better one?

+4
source share
3 answers

FYI, using unescaped in any JavaScript script causes this problem in the browser. Developers should avoid this, so there is no excuse. That way, you can β€œtrust” that it can break anyway.

 <body> <div> <script> alert('<script> tags </script> are not '+ 'valid in regular old HTML without being escaped.'); </script> </body> 

Cm.

http://jsbin.com/itevu

to see how it breaks. :)

+3
source

In some cases, removing script tags results in invalid html:

  <html> <head> </head> <body> <p>This should be <script type="text/javascript"> document.writeln("<b"); </script>>bolded</b>. </body> </html> 
+2
source

You might find the alternative approach useful. You can use the following function to prevent JavaScript from starting:

 function preventJS(html) { return html.replace(/<script(?=(\s|>))/i, '<script type="text/xml" '); } 

And it saves script -tags inside the DOM, so scripts can be used later.

I described this method on my blog here - JavaScript: how to prevent JavaScript from executing inside the html being added to the DOM .

-2
source

All Articles