Display html user uploaded content - security issue

I have a function where the user can upload an html file, which then read its contents via PHP and send it to a third-party API as a string. Now, before I submit it to the API, I want to create a preview of the HTML code that they upload to the user so that they can click the confirmation button to submit it.

HTML files should be mostly alphanumeric templates, but users can modify html and add script tags or embed other malicious code that could harm my website when displayed for preview. Is there any way to avoid this?

I was thinking about removing tags, but what if they have onclick events inside html elements?

+6
source share
4 answers

Id will start with something like this to cut scripts and comments:

$htmlblacklist[] = '@<script[^>]*?>.*?</script>@si'; //bye bye javascript $htmlblacklist[] = '@<![\s\S]*?--[ \t\n\r]*>@'; //goodbye comments //now apply blacklist $value = preg_replace($htmlblacklist, '', $value); 

For inline events, you should use a DOMDocument as it understands HTML, while Regex takes off in the dark.

In fact, you can use DOMDocument for all this and not use Regex at all. Load the HTML into a DOMDocument object and iterate through the tree, deleting what you want.

+2
source

Not 100% this will work for you, but it seems that converting HTML to SVG on canvas will limit the content according to your requirements (without scripts, without loading external sources).

See here for more documentation: https://developer.mozilla.org/en-US/docs/Web/API/Canvas_API/Drawing_DOM_objects_into_a_canvas

You may wonder how this can be safe, given the ability to read sensitive data from the canvas. The answer is this: this solution is based on the fact that the implementation of SVG images is very restrictive. SVG images are not allowed to load any external resources, for example, even those that appear to be from the same domain. Resources such as bitmaps (such as JPEG images) or s must be nested as data: URI.

In addition, you cannot include a script in an SVG image, so there is no risk of accessing the DOM from other scripts and DOM elements in an SVG image cannot accept input events, so there is no way to load privileged information into a form control (such as the full path to file) and render it, then pull this information out of the reading pixels.

0
source

Perhaps I found a library that handles this. It has not been fully tested yet, but based on its description, it may be as follows: http://htmlpurifier.org/

0
source

Use FileReader to read the contents of the file and iframe for safe viewing (or not):

 document.querySelector("button").addEventListener( 'click', function() { let iframe = document.createElement("iframe"), holder = document.querySelector("#iframeholder"), sandboxFlags = [ ...document.querySelectorAll('.sandbox-flags:checked') ].map(_ => _.value).join(','), file = document.querySelector('input[type=file]').files[0], reader = new FileReader(); reader.addEventListener("load", function() { iframe.setAttribute("scrolling", "no"); iframe.setAttribute("frameborder", "0"); iframe.setAttribute("srcdoc", this.result); /* * Sandboxing is not allowed in code snippets * iframe.setAttribute("sandbox", sandboxFlags); * */ console.log(`sandbox=${sandboxFlags}`); while (holder.firstChild) holder.removeChild(holder.firstChild); holder.appendChild(iframe); }, false); reader.readAsText(file); }, false); 
 label { display: block } #iframeholder>iframe { border:1px solid black; height:400px; width:400px; } 
 <div> <input id="browse" type="file" > </div> <label> <input type="checkbox" class="sandbox-flags" value='allow-script' />allow-scripts </label> <label> <input type="checkbox" class="sandbox-flags" value='allow-popups-to-escape-sandbox' />allow-popups-to-escape-sandbox </label> <label> <input type="checkbox" class="sandbox-flags" value='allow-forms' />allow-forms </label> <label> <input type="checkbox" class="sandbox-flags" value='allow-modals' />allow-modals </label> <div> <button type="button">Preview</button> </div> <div id="iframeholder"></div> 
0
source

All Articles