How can I encode the HTML code of all output in a web application?

Question

How can I encode the HTML code of all output in a web application?

I want to prevent XSS attacks in my web application. I found that HTML encoding output can really prevent XSS attacks. Now the problem is, how does HTML encode every single output in my application? Do I have a way to automate this?

I appreciate the answers for JSP, ASP.net and PHP.

+6

php jsp asp.net

Niyaz 12 sept '08 at 11:05

source share

11 answers

One thing you should not do is filter the input as it arrives. People often suggest this because this is the easiest solution, but it leads to problems.

Input data can be sent to several places, in addition, they are output as HTML. For example, it may be stored in a database. The rules for filtering data sent to the database are very different from the rules for filtering HTML output. If you encode the HTML in the input, you will get the HTML in your database. (This is also why the PHP function "magic quotes" is a bad idea.)

You cannot foresee all the places where your input will move. A safe approach is to prepare the data immediately before it is sent somewhere. If you send it to the database, avoid single quotes. If you output HTML, avoid HTML objects. And as soon as it goes somewhere, if you still need to work with the data, use the original version without escaping.

This is more, but you can reduce it using template engines or libraries.

+8

JW. Sep 13 '08 at 17:02

source share

For JSP, you can get your cake and eat it too, with a c: out tag, which excludes XML by default. This means that you can bind your properties to raw elements:

<input name="someName.someProperty" value="<c:out value='${someName.someProperty}' />" />

When bound to a string, someName.someProperty will contain the XML input, but when you exit the page, it is automatically escaped to provide XML objects. This is especially useful for page verification links.

+3

MetroidFan2002 Sep 16 '08 at 2:48

source share

A good way that I used to avoid entering all users is to write a modifier for smarty that speeds up all the variables passed to the template; except those unescape attached to it. This way you only provide HTML access to elements that you explicitly provide access to.

I do not have this modifier; but about the same version can be found here:

http://www.madcat.nl/martijn/archives/16-Using-smarty-to-prevent-HTML-injection..html

In the new release of Django 1.0, this works exactly the same, jay :)

+1

fijter 12 sept '08 at 11:57

source share

You can wrap echo / print, etc. in their own methods, which can then be used to exit the output. i.e. instead of

 echo "blah";

use

 myecho('blah');

you may even have a second parameter that turns off if you need it.

In one project, we had a debug mode in our output functions that made all the output text passing through our method invisible. Then we knew that nothing was left on the screen. It was very helpful to keep track of these naughty unexpressed bits :)

+1

reefnet_alex 12 sept '08 at 12:11

source share

If you really encode HTML for each individual output, the user will see plain text & lt; html & gt; instead of a valid web application.

EDIT: if HTML encodes every single input, you will have a problem accepting an external password containing <, etc.

0

Eugene yokota 12 sept '08 at 11:18

source share

My personal preference is to diligently encode everything that comes from a database, business layer, or user.

In ASP.Net, this is done using Server.HtmlEncode(string) .

The reason for this encoding is that even properties that you can consider logical or numeric can contain malicious code (for example, flag values, if they are executed incorrectly, can be returned as strings. Encoding them before sending the output to the user, then you have vulnerability).

0

Peter Bernier 12 sept '08 at 13:23

source share

The only way to truly protect yourself from this kind of attack is to strictly filter out all the incoming messages that you accept, in particular (although not exclusively) from the public places of your application. I would recommend you take a look at the Daniel Morris PHP Filtering Class (the complete solution), as well as the Zend_Filter (a set of classes that you can use to create your own filter).

PHP is my language of choice when it comes to web development, so I apologize for the bias in my answer.

Kieran.

0

Kieran hall 12 sept '08 at 14:19

source share

there was a good essay from Joel on software (the wrong code looked wrong, I think I'm on my phone, otherwise I will have a url for you) that correctly used the Hungarian notation. A short option would be something like this:

 Var dsFirstName, uhsFirstName : String; Begin uhsFirstName := request.queryfields.value['firstname']; dsFirstName := dsHtmlToDB(uhsFirstName);

Basically the prefix of your variables with something like "us" for the unsafe string "ds" for database security, "hs" for HTML is safe. You just want to encode and decode where you really need it, not everything. But, using prefixes that cause a useful value, looking at your code, you will see the real speed if something is wrong. And you will need different encoding / decoding functions.

0

ddowns Sep 13 '08 at 17:54

source share

Output coding is the best protection. Confirmation of input is great for many reasons, but not 100% protection. If the database is infected with XSS through an attack (such as ASPROX), an input error or malicious insertion does nothing. Output encoding will work.

0

Jmd Feb 01 '11 at 0:06

source share

OWASP has a good API for encoding HTML output, either for use as HTML text (for example, for a paragraph or <textarea> ), or as attribute values (for example, for <input> after a form is rejected):

 encodeForHTML($input) // Encode data for use in HTML using HTML entity encoding encodeForHTMLAttribute($input) // Encode data for use in HTML attributes.

The project (version of PHP) is hosted under http://code.google.com/p/owasp-esapi-php/ and is also available in some other languages, for example..NET.

Remember that you must encode everything (not just user input) and as late as possible (not when storing in the database, but when outputting an HTTP response).

0

leemes Jan 16 '12 at 11:51

source share

David mclaughlin · Accepted Answer · 2008-09-12T11:31:31+0000

You do not want to encode all the HTML, you only want to encode the HTML code of any user input.

For PHP: htmlentities and htmlspecialchars

How can I encode the HTML code of all output in a web application?

More articles: