How will new Unicode domains be handled by regular email expressions?

FROM

In October 2009, the Internet Corporation for Assigned Names and Numbers (ICANN) approved the creation of top-level country code domains (ccTLDs) on the Internet that use the IDNA Standard for native language scripts.

I am sure that the standard regular expressions of most sites currently in use will not mark them as valid, or am I mistaken? Has anyone really thought about how this will play, or has anyone done anything about it?

Hope I'm not jumping a gun here.

+4
source share
2 answers

When a user enters an internationalized domain into a browser, he is translated into an ASCII form; E-mail, of course, should work the same way (however, I never received mail from the IDNA domain, and I have reason to believe that browsers are its only developers).

Mail agents should be aware that when they see Unicode in an address, it must be translated into an IDNA form , and then MX records will be scanned. I do not think that in all of my system administration I have ever taken this into account. Being able to accept something that the browser will translate, since the IDNA in the form element is not what I know how to do it. If it is truly translated into IDNA and the regex tries to validate it, it should work.

I would not be surprised if the international domain fails to execute most regular expressions of email, and I think that the relevance of such a failure is less than 1%. IDNA is really an address bar and a terrible hack; I would really be surprised if email worked on top of it.

Everyone is worried, as if something is changing. This is not true. IDNA just moves from a domain to a TLD, and the business will be as ordinary as before. Do not overdo it, OP.

+4
source

Older regular expressions will mark IDNA names valid if they are correctly translated into ASCII DNS names .

So we have a problem here. You cannot expect the user to simply enter Unicode in the text box and get the server-side version of the ASCII domain name.

IDNA encoding is not pleasant and easy: Unicode characters are deleted for the word in which they are located and placed after it, with the position marker.

Repeating this (for example) in javascript is slow, sad and boring. A url-like approach would simplify porting to each language.

Also, people with systems that do not support IDNA have difficulty figuring out that this domain looks in ASCII manually.

I feel that IDNA came out pretty ugly and that would hinder its adoption.

+3
source

All Articles