Using htmlpurifier for input / output / escaping

I am processing user input from the public using the WYSIWYG javascript editor, and I plan to use htmlpurifier to clear the text.

I thought that it would be enough to use htmlpurifier at the input, save the cleared input in the database and then output it without further escaping / filtering. But I heard other opinions that you should always avoid going out.

Can someone explain why I need to clear the output if I already clear the input?

+6
php io filtering htmlpurifier
source share
3 answers

I assume that your WYSIWYG editor generates HTML, which is then validated and put into the database. In this case, the check has already taken place, so there is no need to double check.

As for the "exit from the conclusion", this is another matter. You cannot escape the resulting HTML, otherwise you will not have formatted text and the tags will be visible. Escape escaping is used when you do not want the output to interfere with page layout.

I would add that you have to be very careful what you allow at the verification stage. You might want to allow multiple HTML tags and attributes.

+4
source share

To be 100% safe, use HTMLPurifier twice. Before storing HTML in DB and before its output to the screen.
A huge drawback of this solution is performance. HTMLPurifier is ultraslow when filtering HTML, and you may encounter a lot of processing time for your pages.

You should be fine if you only perform 1-2 filtering before displaying something on the screen, but if you do 10 filters for each request, as we did, we rather decide not to use HTMLPurifier when releasing a lot of texts .

HTMLPurifier took 60% of the processing time for each request, and we wanted to achieve less response time and higher UX.

It depends on your situation. If you can afford to use HTMLPurifier before release, go for it - that’s better, and you always control the tags you want to allow (for new and even old content stored in your db).

+2
source share

The mantra always avoids your output, which is the conversion of text to HTML, is a good and reasonable default when it returns to work in web space. In the case of the HTML cleaner, you specifically break this good advice, because you are really doing the conversion from HTML to HTML and treating HTML as text again, it really doesn't make sense.

+1
source share

All Articles