I have a system that allows users to enter characters reserved in HTML into a text area and then send it to my application. This information is then stored in a database for later retrieval and display. Alarms (should be) removed in the head. I need to make sure that I avoid XSS attacks because I will show this data somewhere else in the application. Here are my options that I see:
Encode before saving to DB
I can encode HTML data on the way to the database, so no HTML characters are written to the database.
Pros:
- Developers should not forget that HTML encodes data when it is displayed on a web page.
Minuses:
- Now the data does not make sense for desktop applications (or anything else other than HTML). Material is displayed as
< > & < > & etc.
Do not encode HTML before saving to DB
I can encode HTML files whenever I need to display them on a web page.
Pros:
- Feels right, because it maintains the integrity of the data entered by the user.
- Allows non-HTML applications to simply display this data without worrying about HTML coding.
Minuses:
- We can display this data in many places, and we need to make sure that every developer knows that when displaying this field you will need to encode it HTML.
- People forget things. There will be at least one instance when we forget the HTML to encode data.
Data scrub before saving to the database (not HTML encoding)
I can use a trusted third-party library to remove potentially harmful HTML and get a safe piece of HTML to save the database, not HTML encoding.
Pros:
- Saves most of the original input, so displaying in a format other than HTML makes sense.
- Less catastrophic if a developer forgets to HTML code this information for display on a web page.
Minuses:
- There was still a mess with the data when the user entered it. If they really want to type the
<script> or <object> , it will not do this, and because of this we will receive support calls and emails.
My question is: what is the best option, or if there is another way to do this, what is it?
source share