Unit testing parser / cleaner HTML?

I am trying to choose between several different HTML parsers for a project that I am working on, part of which receives HTML input from a client.

I built a simple automatic test for each of them to find out if they fit my needs. I have a large number of real HTML snippets for testing, but they are not enough for security testing, since they (probably) do not contain malicious code.
I do not mind viewing the results manually.

My question is: is there a freely accessible database or a list of HTML snippets containing invalid HTML and scripts for testing for XSS?

+4
source share
3 answers

ha.ckers XSS cheatsheet is quite extensive and has served as a catalyst for me to create a sanitiser whitelist in jsoup.

+2
source

I built html-sanitizer-testbed just for this purpose. It consists of two components:

  • A set of tests designed to test the sanitizer security for HTML. I collected every tricky case that I managed to find. It includes everything on the hatsckers.org XSS cheatsheet, as well as many other test cases that I have collected over the years. Over the years, I have analyzed dozens of HTML sanitizers (most of them were vulnerable) and added a test case for every security vulnerability I have ever discovered, so this is a pretty nice collection.

  • In addition, it provides some test automation features, so you don’t need to manually view the results: you can start the browser and check how the browser seems to have executed any Javascript in the results of the sanitizer (in this case, the sanitizer is destroyed). This part is not 100% reliable and has no guarantees, so for maximum efficiency you can view the results manually. However, it still worked very well.

I welcome feedback and contributions.

0
source

Source: https://habr.com/ru/post/1315695/


All Articles