You cannot embed user-provided data in an HTML document without first coding it. Your goal is to ensure that the structure of the document cannot be changed and that the data is always treated as data values โโand is never HTML markup or Javascript code. Attacks on this mechanism are commonly known as cross-site scripting or simply โXSSโ.
If you insert an HTML attribute value in the attribute, you must ensure that the string cannot cause the attribute value to end prematurely. You must also, of course, make sure that the tag itself cannot be completed. You can achieve this by HTML encoding any characters that are not guaranteed to be safe.
If you write HTML so that the value of the tag attribute appears inside a pair of characters with a double quote or a single quote, you only need to make sure that you are html-encoding the quote character that you have chosen to use. If you incorrectly specify your attributes as described above, you need to worry about many other characters, including spaces, characters, punctuation, and other ascii control characters. Although, to be honest, it's probably safer to encode these non-alphanumeric characters anyway.
Remember that the value of an HTML attribute can appear in three different syntactic contexts:
Double-quoted attribute value
<input type="text" value="**insert-here**" />
All you need to do is encode the double quote character for a suitable HTML safe value, for example "
Single quote attribute value
<input type='text' value='**insert-here**' />
You only need to encode the single quote character to a suitable HTML safe value, for example ‘
Value without quotes
<input type='text' value=**insert-here** />
You should never have a html tag attribute value without quotes, but sometimes this is not subject to control. In this case, we really need to worry about spaces, punctuation, and other control characters, as they will knock us out of the attribute value.
With the exception of alphanumeric characters, avoid all characters with ASCII values โโless than 256 with the format &#xHH; (or named entity, if available) to prevent the transition from the attribute. Unquoted attributes can be broken into many characters, including [space] % * + , - / ; < ^ and | (and more). [para removed from OWASP]
Remember that the above rules only apply to control injections when pasting an HTML attribute into a value. In other areas of the page, different rules apply.
See the XSS Security Bypass in OWASP for more information.