The danger in XSS is that one user can paste html code into their input, which you later paste into a web page that is sent to another user.
Basically, you can follow two strategies if you want to protect against this. You can either remove all dangerous characters from user input when they enter your system, or you can html-encode dangerous characters when you later write them back to the browser.
An example of the first strategy:
User enters data (with html code)
- Server deletes all dangerous characters
- Modified data is stored in the database
- After a while, the server reads the changed data from the database
- The server inserts the changed data on the web page to another user.
An example of the second strategy:
- User enters data (with html code)
- Unmodified data with dangerous characters stored in a database
- After some time, the server reads unmodified data from the database
- The html server encodes dangerous data and inserts it into a web page to another user.
The first strategy is simpler because you usually read the data you use more often. However, it is also more complicated because it potentially destroys data. It is especially difficult if you need data for something else, except to send it back to the browser later (for example, using an email address to send email). This makes it difficult, for example, to search the database, include data in the report in pdf format, insert data into e-mail, and so on.
Another strategy has the advantage of not destroying the input, so you have more freedom in how you want to use the data later. However, it may be more difficult to verify that you are html-encoding all the data sent by the user that is sent to the browser. The solution to your specific problem will be the html-encode email address when (or if) you ever posted that email address on a web page.
The XSS problem is an example of a more general problem that occurs when mixing user-submitted data and control code. SQL injection is another example of the same problem. The problem is that user-submitted data is interpreted as instructions, not data. The third, less well-known example is if you are mixing data sent by the user in an email. User-submitted data may contain strings that the email server interprets as instructions. The “dangerous character” in this scenario is a line break followed by “From:”.
It would be impossible to check all the input data for all possible control characters or sequences of characters, which in a sense can be interpreted as instructions in some potential application in the future. The only permanent solution to this is to actually sanitize all the data that is potentially dangerous when you actually use that data.