It depends on whether you process the IDN before or after the IDN toASCII algorithm. (that is, you see the domain name παράδειγμα.δοκιμή as παράδειγμα.δοκιμή or as xn--hxajbheg2az3al.xn--jxalpdlp
In the latter case, when you process an IDN through punycode, the old RFC 1123 rules apply:
U + 0041 through U + 005A (AZ), U + 0061 through U + 007A (az), collapsed with each other, U + 0030 through U + 0039 (0-9) and U + 002D (-). [edit: and U + 002E (.) of course; rules for tags allow others, with dots between tags, sometimes these are obvious bits that are easiest to forget]
If you see it in IDN form, the valid characters are very diverse, see http://unicode.org/reports/tr36/idn-chars.html for a convenient diagram of all valid characters.
Most likely, your network code will handle punycode, but your display code (or even just pass lines to other layers and from other layers) with a more readable form, since no one starts the server on السعودية. domain wants to see their server listed as being on .xn - mgberp4a5d4ar
Jon Hanna Aug 19 '10 at 15:15 2010-08-19 15:15
source share