DOM based on XSS Attack and InnerHTML

How could a lower DOM-based XSS attack be protected?

In particular, is there a protection function () that will make security lower? If not, is there another solution? for example: Providing a div id and then assigning the element to the onclick handler

<?php function protect() { // For non-DOM XSS attacks, hex-encoding all non-alphanumeric characters // with ASCII values less than 256 works (ie: \xHH) // But is it possible to augment this function to protect against // the below DOM based XSS attack? } ?> <body> <div id="mydiv"></div> <script type="text/javascript"> var xss = "<?php echo protect($_GET["xss"]) ?>"; $("#mydiv").html("<div onclick='myfunc(\""+xss+"\")'></div>") </script> </body> 

I hope for an answer that doesn’t "avoid using innerHTML" or "regex xss variable in [a-zA-Z0-9]" ... ie: is there a more general solution?

thanks

+4
source share
3 answers

Turning around on Vineet's answer, here is a set of test cases:

http://ha.ckers.org/xss.html

+2
source

I played with the PHP DOMDocument and its related classes to write an HTML parser that can do such things. It is at a very early stage of development at the moment and nowhere near ready for actual use, but my early experiments seem to show some promise of this idea.

Basically, you load your markup into a DOMDocument, and then traverse the tree. For each node in the tree, you verify that the node type matches the list of valid node types. If the node type is not in the list, it is removed from the tree.

You can use a similar approach to find all SCRIPT tags in a piece of markup and remove them. DSS-based XSS is toothless if you can pull any inline scripts out of the markup you provided.

This is the code I'm using, along with a test case that processes the StackOverflow homepage. As I said, this is far from a quality code of quality and nothing more than a proof of concept. Nevertheless, I hope you find this useful.

 <?php class HtmlClean { private $whiteList = array ( '#cdata-section', '#comment', '#text', 'a', 'abbr', 'acronym', 'address', 'b', 'big', 'blockquote', 'body', 'br', 'caption', 'cite', 'code', 'col', 'colgroup', 'dd', 'del', 'dfn', 'div', 'dl', 'dt', 'em', 'fieldset', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'head', 'hr', 'html', 'i', 'img', 'ins', 'kbd', 'li', 'link', 'meta', 'ol', 'p', 'pre', 'q', 'samp', 'small', 'span', 'strike', 'strong', 'style', 'sub', 'sup', 'table', 'tbody', 'td', 'tfoot', 'th', 'thead', 'title', 'tr', 'tt', 'ul', 'var' ); private $attrWhiteList = array ( 'class', 'id', 'title' ); private $dom = NULL; /** * Get current tag whitelist * @return array */ public function getWhiteListTags () { $this -> whiteList = array_values ($this -> whiteList); return ($this -> whiteList); } /** * Add tag to the whitelist * @param string $tagName */ public function addWhiteListTag ($tagName) { $tagName = strtolower (trin ($tagName)); if (!in_array ($tagName, $this -> whiteList)) { $this -> whiteList [] = $tagName; } } /** * Remove a tag from the whitelist * @param string $tagName */ public function removeWhiteListTag ($tagName) { if ($index = array_search ($tagName, $this -> whiteList)) { unset ($this -> whiteList [$index]); } } /** * Load document markup into the class for cleaning * @param string $html The markup to clean * @return bool */ public function loadHTML ($html) { if (!$this -> dom) { $this -> dom = new DOMDocument(); } $this -> dom -> preserveWhiteSpace = false; $this -> dom -> formatOutput = true; return $this -> dom -> loadHTML ($html); } public function outputHtml () { $ret = ''; if ($this -> dom) { $ret = $this -> dom -> saveXML (); } return ($ret); } private function cleanAttrs (DOMnode $elem) { $attrs = $elem -> attributes; $index = $attrs -> length; while (--$index >= 0) { $attrName = strtolower ($attrs -> item ($indes) -> name); if (!in_array ($attrName, $this -> attrWhiteList)) { $elem -> removeAttribute ($attrName); } } } /** * Recursivly remove elements from the DOM that aren't whitelisted * @param DOMNode $elem * @return array List of elements removed from the DOM * @throws Exception If removal of a node failed than an exception is thrown */ private function cleanNodes (DOMNode $elem) { $removed = array (); if (in_array (strtolower ($elem -> nodeName), $this -> whiteList)) { // Remove non-whitelisted attributes if ($elem -> hasAttributes ()) { $this -> cleanAttrs ($elem); } /* * Iterate over the element children. The reason we go backwards is because * going forwards will cause indexes to change when elements get removed */ if ($elem -> hasChildNodes ()) { $children = $elem -> childNodes; $index = $children -> length; while (--$index >= 0) { $removed = array_merge ($removed, $this -> cleanNodes ($children -> item ($index))); } } } else { // The element is not on the whitelist, so remove it if ($elem -> parentNode -> removeChild ($elem)) { $removed [] = $elem; } else { throw new Exception ('Failed to remove node from DOM'); } } return ($removed); } /** * Perform the cleaning of the document */ public function clean () { $removed = $this -> cleanNodes ($this -> dom -> getElementsByTagName ('html') -> item (0)); return ($removed); } } $test = file_get_contents( ('http://www.stackoverflow.com/')); // Windows-stype linebreaks really foul up the works. There probably a better fix for this $test = str_replace (chr (13), '', $test); $cleaner = new HtmlClean (); $cleaner -> loadHTML ($test); echo ('<h1>Before</h1><pre>' . htmlspecialchars ($cleaner -> outputHtml ()) . '</pre>'); $start = microtime (true); $removed = $cleaner -> clean (); $cleanTime = microtime (true) - $start; echo ('<h1>Removed tag list</h1>'); foreach ($removed as $elem) { var_dump ($elem -> nodeName); } echo ('<h1>After</h1><pre>' . htmlspecialchars ($cleaner -> outputHtml ()) . '</pre>'); // benchmark var_dump ($cleanTime); ?> 
+1
source

I am not a PHP expert, but if you want to prevent XSS attacks against the presented code, in the current format with minimal changes, you can use the PHP version of OWASP ESAPI . To be specific, use the ESAPI JavaScript codec class to protect the contents of the xss variable as it appears in the JavaScript context.

0
source

All Articles