What does “case insensitive” mean in RFC 3986 for non-English characters?

Question

What does “case insensitive” mean in RFC 3986 for non-English characters?

RFC 3986 indicates that the host component of the URI is "case insensitive." However, he does not indicate what “case insensitive” means in terms of UCS or UTF-8 characters.

The examples given in the RFC (for example, " <HTTP://www.EXAMPLE.com/> equivalent <HTTP://www.EXAMPLE.com/>") allow us to conclude that case insensitive means at least that the characters AZ are considered equivalent to the character 32 in front of them in the UTF-8 character set, etc. e. az. However, it does not mention how to handle characters outside this range. Therefore, given the uncoded non-normalized registered name www.OLÉ.com, I see three possible forms of normalization that RFCs allow:

Lower case at www.olé.com, then the percentage code at www.ol% E9.com
Lowercase only AZ characters at www.olÉ.com and then percent encoding at www.ol% C9.com
The percentage is encoded at www.OL% C9.com and then obscures the unsolicited encoded parts at www.ol% C9.com, producing the same result as 2.

So the question is: what is right? If this is case 1. what determines which characters are considered upper case and which are considered lower case (and which characters do not have a case)?

+5

uri rfc3986 utf-8 ucs

Mark slater Oct 15 '11 at 20:14

source share

1 answer

timgws · Answer 1 · 2015-10-30T03:05:39+0000

Hostnames resolved by DNS are always lowercase.

it is not possible to have UTF-8 characters in DNS host names (RFC 1123), however a workaround has been applied with "internationalized domain names". This workaround is usually called punycode .

Punycode ASCII ASCII.

, ASCII, ASCII, (, ).
- https://www.ietf.org/rfc/rfc3492.txt

, (www.olé.com), , , www.ol% E9.com.

, , URL- , , , .

, a, :

<a href="//www.ol%C3%A9.com">Click Here</a>

DNS- www.ol%C3%A9.com, punycode:

www.ol%C3%A9.com

www.olé.com

punycode :

www.xn--ol-cja.com

- . , www.olé.com, www.olÉ.com DNS (www.xn--ol-cja.com), www.olÉ.com www.olé.com.

IDN, , , punycode:

Verisign IDN Conversion Tool (http://mct.verisign-grs.com/)
Punycoder Punycode to Text/Unicode https://www.punycoder.com/

Verisign IDN . www.olÉ.com , , .

IDNA ( ) , RFC, :

rfc5894 3.1.3 , , :

- - , Unicode.

What does “case insensitive” mean in RFC 3986 for non-English characters?

More articles: