I know this is a bit of an old post, but all regexes lack one very important component here: IDN domain name support.
IDN domain names begin with xn--. These include extended UTF-8 characters in domain names. For example, did you know that "♡ .com" is a valid domain name? Yes, "love heart dot com"! To verify the domain name, you need to allow http: // xn-- c6h.com/ to be verified.
Please note that in order to use this regular expression you need to convert the domain to lowercase and also use the IDN library to provide encoding of domain names in ACE (also known as "ASCII-compatible encoding"). One good library is GNU-Libidn.
idn (1) is the command line interface for the internationalized domain name library. In the following example, the host name in UTF-8 is converted to ACE encoding. The resulting URL is https: // nic. xn-- flw351e / can then be used as the ACE-encoded equivalent of https: // nic. 谷 歌 / .
$ idn
This magical regex should cover most domains (although I'm sure there are many valid edge cases that I missed):
^((?!-))(xn--)?[a-z0-9][a-z0-9-_]{0,61}[a-z0-9]{0,1}\.(xn--)?([a-z0-9\-]{1,61}|[a-z0-9-]{1,30}\.[az]{2,})$
When choosing a domain validation regular expression, you should see if the domain matches the following:
- xn-- stackoverflow.com
- Stackoverflow xn-- com
- stackoverflow.co.uk
If these three domains do not pass, your regular expression may not allow valid domains!
Visit the internationalized domain name support page in the Oracle International Language Language Guide for more information.
Feel free to try regex here: http://www.regexr.com/3abjr
ICANN maintains a list of delegated domains that you can use to view some sample IDN domains.
Edit:
^(((?!-))(xn--|_{1,1})?[a-z0-9-]{0,61}[a-z0-9]{1,1}\.)*(xn--)?([a-z0-9][a-z0-9\-]{0,60}|[a-z0-9-]{1,30}\.[az]{2,})$
This regex will stop domains that have a "-" at the end of the host name as marked as valid. In addition, it allows an unlimited number of subdomains.