Will HTML coding prevent all kinds of XSS attacks?

I am not interested in other types of attacks. I just want to know if HTML Encode can prevent all kinds of XSS attacks.

Is there a way to do an XSS attack even if using HTML Encode?

+57
security html-encode xss
Sep 10 '08 at 10:03
source share
9 answers

No.

Having dropped the topic allowing some tags (not really a question), HtmlEncode just DOES NOT cover all XSS attacks.

For example, consider the client-side javascript created by the server - the server dynamically outputs htmlencoded values ​​directly to javascript on the client side, htmlencode will not stop , the script entered from the executable file.

Next, consider the following pseudo-code:

<input value=<%= HtmlEncode(somevar) %> id=textbox> 

Now, if it is not immediately visible, if somevar (sent by the user, of course) is set, for example, to

 a onclick=alert(document.cookie) 

net result

 <input value=a onclick=alert(document.cookie) id=textbox> 

that would clearly work. Obviously, this could be (almost) any other script ... and HtmlEncode will not help.

There are several additional vectors that need to be considered ... including the third taste of XSS, called DOM-based XSS (where a malicious script is generated dynamically on the client, for example, based on # values).

Also do not forget about attacks like UTF-7 - where the attack looks like

 +ADw-script+AD4-alert(document.cookie)+ADw-/script+AD4- 

Nothing special about the encoding ...

Of course, the solution (in addition to correctly and restrictively checking the whitelist input) is to perform context-sensitive encoding: HtmlEncoding is great if you develop an IS HTML context or maybe you need JavaScriptEncoding, or VBScriptEncoding, or AttributeValueEncoding, or ... etc.

If you use MS ASP.NET, you can use their Anti-XSS library, which provides all the necessary context encoding methods.

Please note that all encoding should not be limited to user input, but also stored values ​​from the database, text files, etc.

Oh, and don't forget to explicitly set the encoding, both in the HTTP header and in the META tag, otherwise you will still have UTF-7 vulnerabilities ...

For more information and a fairly definitive list (constantly updated), check out the RSnake Cheat Sheet: http://ha.ckers.org/xss.html

+84
Sep 16 '08 at 7:57
source share

If you systematically encode all user input before displaying , then yes, you are safe , you are still not 100% safe.
(See @Avid Post for more details)

In addition, there are problems when you need to skip some tags so that you allow users to post images or bold text or any function for which user input is treated as (or converted to) un-encoded markup.

You will need to set up a decision system to decide which tags are allowed and which are not, and it is always possible that someone will find a way to allow the skipped tag.

This helps if you follow Joel’s advice “Wrong code for incorrect code” or if your language helps you by warning / not compiling when outputting raw user data (static typing).

+8
10 Sep '08 at 10:35
source share

If you code everything that will be. (depending on your platform and htmlencode implementation). But any useful web application is so complicated that it’s easy to forget to check every part of it. Or maybe a third-party component is unsafe. Or maybe some code that you, although you encoded, didn’t do, so you forgot it somewhere else.

That way you can also check everything on the input side. And you can check the material you are reading from the database.

+3
Sep 10 '08 at 10:16
source share

As everyone else mentioned, you are safe as long as you encode all user input before displaying it. This includes all query parameters and data obtained from the database, which can be changed by user input.

Like the one mentioned by Pat , you sometimes want to display some tags, not just tags. One common way to do this is to use a markup language such as Textile , Markdown , or BBCode . However, even markup languages ​​can be vulnerable to XSS, just keep in mind.

 # Markup example [foo](javascript:alert\('bar'\);) 

If you decide to skip the "safe" tags, I would recommend finding an existing library to analyze and sanitize your code before exiting. There are many XSS vectors that you will need to detect before your sanitizer is safe enough.

+1
Sep 10 '08 at 11:45
source share

The second metavida tip is to find a third-party library to handle output filtering. HTML character neutralization is a good approach to stop XSS attacks. However, the code you use to convert metacharacters may be vulnerable to evasion attacks; for example, if it does not handle Unicode and internationalization correctly.

The classic simple mistake made by homebrew output filters is to catch only <and>, but skip things like "that can bring user-controlled output to the attribute space of the HTML tag where Javascript can be attached to the DOM.

+1
Sep 11 '08 at 19:40
source share

No, just encoding common HTML tokens DOES NOT fully protect your site from XSS attacks. See, for example, this XSS vulnerability found at google.com:

http://www.securiteam.com/securitynews/6Z00L0AEUE.html

Important information about this type of vulnerability is that an attacker could encode his XSS payload using UTF-7, and if you did not specify a different character encoding on your page, the user browser could interpret UTF-7 and execute a script attack.

+1
Sep 18 '08 at 16:19
source share

I want to offer an HTML cleaner ( http://htmlpurifier.org/ ) It does not just filter html, it basically tokenizes and recompiles it. It is truly an industrial force.

This has the added benefit of guaranteeing a valid html / xhtml output.

Also n'thing textiles, its a great tool, and I use it all the time, but I would run it, although the html cleaner too.

I do not think that you understood what I had in mind to denote tokens. HTML Purifier does not just “filter”, it actually restores html. http://htmlpurifier.org/comparison.html

0
Sep 18 '08 at 10:35
source share

Another thing you need to check is your entry. You can use the referrer string (most of the time) to check it on your own page, but inserting a hidden random number or something in your form and then checking it (possibly with a session set variable) also helps to know that login comes from your own site, not from a phishing site.

0
Sep 19 '08 at 10:15
source share

I don’t believe that. Html Encode converts all functional characters (characters that can be interpreted by the browser as code) into entity references that cannot be parsed by the browser and therefore cannot be executed.

 &lt;script/&gt; 

Unable to execute the above browser.

** Unless they are a bug in the browser, of course. *

-one
Sep 10 '08 at 10:14
source share



All Articles