RFC 3986 indicates that the host component of the URI is "case insensitive." However, he does not indicate what “case insensitive” means in terms of UCS or UTF-8 characters.
The examples given in the RFC (for example, " <HTTP://www.EXAMPLE.com/> equivalent <HTTP://www.EXAMPLE.com/>") allow us to conclude that case insensitive means at least that the characters AZ are considered equivalent to the character 32 in front of them in the UTF-8 character set, etc. e. az. However, it does not mention how to handle characters outside this range. Therefore, given the uncoded non-normalized registered name www.OLÉ.com, I see three possible forms of normalization that RFCs allow:
- Lower case at www.olé.com, then the percentage code at www.ol% E9.com
- Lowercase only AZ characters at www.olÉ.com and then percent encoding at www.ol% C9.com
- The percentage is encoded at www.OL% C9.com and then obscures the unsolicited encoded parts at www.ol% C9.com, producing the same result as 2.
So the question is: what is right? If this is case 1. what determines which characters are considered upper case and which are considered lower case (and which characters do not have a case)?
source
share