One client-side response to Sanitize / Rewrite HTML suggests borrowing White List JS sanitizer from Google Caja, which, as far as I can scan using quick scrolling, implements an HTML SAX parser without relying on the DOM browser.
Update: Also, keep in mind that the Caja sanitizer appears to have received a complete professional security review, while regular expressions are known to be very easy to seal from a security point of view.
Update 2017-09-24: There is also DOMPurify . I haven't used it yet, but it looks like it matches or exceeds every point I'm looking for:
Depending on the capabilities provided by the runtime, it is possible. (It is important both for performance and for maximum security, relying on proven, mature implementations as much as possible.)
- Based on browser DOM or jsdom for Node.JS.
The default configuration, designed as little as possible, although it guarantees the removal of javascript.
- HTML, MathML, and SVG Support
- Returns to proprietary, non-configurable Microsoft
toStaticHTML for IE8 and IE9.
High configuration, which makes it suitable for limiting entry restrictions, which can contain arbitrary HTML code, for example, a WYSIWYG or Markdown comment field. (Actually, this is the top of the heap)
- Supports the normal whitelist / attribute / blacklist and whitelist links attribute
- It has special options for further disinfection of some common types of HTML metacharacters.
They take compatibility and reliability seriously.
- Automated tests running in 16 different browsers, as well as three different major versions of Node.JS.
- To keep CI developers and hosts on the same page, lock files are published.
ssokolow Sep 14 '10 at 3:15 2010-09-14 03:15
source share