What is a regular expression that matches a valid domain name without subdomains?

Sorry for the 10,000th question in RegEx first,

I understand that there are other issues related to the domain, but the regular expression either does not work properly, is too complicated, or for URLs with subdomains, protocols, and file paths.

Mine is simpler, I need to check the domain name:

google.com

stackoverflow.com

So, a domain in its rawest form is not even a subdomain, such as www.

  • Symbols must be az | AZ | 0-9 and period (.) And dash (-)
  • Part of a domain name must not begin or end with a dash (-) (for example, -google-.com)
  • The domain name part must be between 1 and 63 characters long
  • Extension (TLD) can now be something under # 1 rules, I can confirm them later, it must be 1 or more characters, but

Edit: TLD seems to be 2-6 characters long as it costs

no. 4: TLD really needs to be marked as a “subdomain”, as it should include things like .co.uk - I would suggest that the only possible check (other than checking the list) would be “after the first dot there should be one or more characters by rules # 1

Thank you very much, believe me, I tried!

+96
regex validation domain-name
Apr 24 2018-12-24T00:
source share
17 answers

Well, it's pretty simple a little modest than it looks (see comments), given your special requirements:

/^[a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9]\.[a-zA-Z]{2,}$/ 

But note that this will reject many valid domains.

+38
Apr 24 2018-12-12T00:
source share

I know this is a bit of an old post, but all regexes lack one very important component here: IDN domain name support.

IDN domain names begin with xn--. These include extended UTF-8 characters in domain names. For example, did you know that "♡ .com" is a valid domain name? Yes, "love heart dot com"! To verify the domain name, you need to allow http: // xn-- c6h.com/ to be verified.

Please note that in order to use this regular expression you need to convert the domain to lowercase and also use the IDN library to provide encoding of domain names in ACE (also known as "ASCII-compatible encoding"). One good library is GNU-Libidn.

idn (1) is the command line interface for the internationalized domain name library. In the following example, the host name in UTF-8 is converted to ACE encoding. The resulting URL is https: // nic. xn-- flw351e / can then be used as the ACE-encoded equivalent of https: // nic. 谷 歌 / .

  $ idn --quiet -a nic.谷歌nic.xn--flw351e 

This magical regex should cover most domains (although I'm sure there are many valid edge cases that I missed):

 ^((?!-))(xn--)?[a-z0-9][a-z0-9-_]{0,61}[a-z0-9]{0,1}\.(xn--)?([a-z0-9\-]{1,61}|[a-z0-9-]{1,30}\.[az]{2,})$ 

When choosing a domain validation regular expression, you should see if the domain matches the following:

  1. xn-- stackoverflow.com
  2. Stackoverflow xn-- com
  3. stackoverflow.co.uk

If these three domains do not pass, your regular expression may not allow valid domains!

Visit the internationalized domain name support page in the Oracle International Language Language Guide for more information.

Feel free to try regex here: http://www.regexr.com/3abjr

ICANN maintains a list of delegated domains that you can use to view some sample IDN domains.




Edit:

  ^(((?!-))(xn--|_{1,1})?[a-z0-9-]{0,61}[a-z0-9]{1,1}\.)*(xn--)?([a-z0-9][a-z0-9\-]{0,60}|[a-z0-9-]{1,30}\.[az]{2,})$ 

This regex will stop domains that have a "-" at the end of the host name as marked as valid. In addition, it allows an unlimited number of subdomains.

+72
Nov 18 '14 at 6:08
source share

My RegEx is as follows:

^[a-zA-Z0-9][a-zA-Z0-9-_]{0,61}[a-zA-Z0-9]{0,1}\.([a-zA-Z]{1,6}|[a-zA-Z0-9-]{1,30}\.[a-zA-Z]{2,3})$

this is normal for i.oh1.me and for wow.british-library.uk

UPD

Here is the updated rule

 ^(([a-zA-Z]{1})|([a-zA-Z]{1}[a-zA-Z]{1})|([a-zA-Z]{1}[0-9]{1})|([0-9]{1}[a-zA-Z]{1})|([a-zA-Z0-9][a-zA-Z0-9-_]{1,61}[a-zA-Z0-9]))\.([a-zA-Z]{2,6}|[a-zA-Z0-9-]{2,30}\.[a-zA-Z]{2,3})$ 

Regular expression visualization

https://www.debuggex.com/r/y4Xe_hDVO11bv1DV

it now checks for the presence of - or _ at the beginning or end of the domain label.

+45
Nov 18 '13 at 11:45
source share

My bid:

 ^(?:[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?\.)+[a-z0-9][a-z0-9-]{0,61}[a-z0-9]$ 

Explanations:

A domain name is created from segments. Here is one segment (except the final):

 [a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])? 

It can have 1-63 characters, does not start or end with the "-" character.

Now add the '.' to him and repeat at least once:

 (?:[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?\.)+ 

Then attach the end segment 2-63 characters long:

 [a-z0-9][a-z0-9-]{0,61}[a-z0-9] 

Check here: http://regexr.com/3au3g

+14
May 2, '15 at 21:50
source share

Just a small correction - the last part should be up to 6. Therefore,

 ^[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[az]{2,6}$ 

The longest TLD museum (6 characters) - http://en.wikipedia.org/wiki/List_of_Internet_top-level_domains

+13
Jun 04 '13 at 15:45
source share

This answer applies to domain names (including service RRs), and not to host names (for example, an email host name).

 ^(?=.{1,253}\.?$)(?:(?!-|[^.]+_)[A-Za-z0-9-_]{1,63}(?<!-)(?:\.|$)){2,}$ 

This is basically mkyong answer and optionally:

  • The maximum length is 255 octets, including length prefixes and zero root.
  • Allow trailing "." For an explicit dns root.
  • Allow leading '_' for the RR of the service domain (errors: does not apply a maximum of 15 characters for the _ labels, and it does not require at least one domain above the RR service)
  • Meets all possible TLDs.
  • Does not capture subdomain labels.

Piecemeal

Look, limit the maximum length from ^ $ to 253 characters with the optional final literal '.'

 (?=.{1,253}\.?$) 

Look, the next character is not a "-", and not one "_" follows any characters before the next ".". That is, make sure that the first character of the label is not "-", and only the first character can be "_".

 (?!-|[^.]+_) 

1 to 63 allowed characters per label.

 [A-Za-z0-9-_]{1,63} 

Look at the back, the previous character is not a '-'. That is, make sure that the last character of the label is not a '-'.

 (?<!-) 

Forced '.' at the end of each label, except the last, where this is optional.

 (?:\.|$) 

Mostly, in combination with the foregoing, this requires at least two domain levels, which is not entirely correct, but is usually a reasonable assumption. Change from {2,} to + if you want to allow the use of TLDs or unqualified relative subdomains via (for example, localhost, myrouter, to.)

 (?:(?!-|[^.]+_)[A-Za-z0-9-_]{1,63}(?<!-)(?:\.|$)){2,} 

Unit tests for this expression.

+12
Dec 16 '16 at 23:16
source share

The accepted answer does not work for me, try the following:

^ ((-) [A-Za-z0-9 -] {1,63} (<?! -?!.) \) + [A-Za-Z] {2,6} $

Visit Unit Test Cases to check.

+11
Sep 08 '14 at 4:33
source share

Thank you for pointing the right direction in domain name verification solutions in other answers. Domain names can be verified in various ways.

If you need to verify the IDN domain in a human - friendly manner, the regular expression \p{L} will help. This allows you to match any character in any language.

Please note that the last part may also contain hyphens ! Since Chinese names encoded in Punycode can contain Unicode characters in tld.

I came up with a solution that would match, for example:

  • google.com
  • masełkowski.pl
  • maselkowski.pl
  • m.maselkowski.pl
  • www.masełkowski.pl.com
  • xn--masekowski-d0b.pl
  • 中国 互联 网络 信息 中心. 中国
  • Xn - fiqa61au8b7zsevnm8ak20mc4a87e.xn - fiqs8s

Regex is:

 ^[0-9\p{L}][0-9\p{L}-\.]{1,61}[0-9\p{L}]\.[0-9\p{L}][\p{L}-]*[0-9\p{L}]+$ 

Check and configure here

NOTE. This regular expression is valid because the current set of domain names is allowed.

UPDATE : even more simplified since a-aA-Z\p{L} is the same as just \p{L}

NOTE 2. The only problem is that it will match domains with double points in it ..., for example, masełk..owski.pl . If anyone knows how to fix this, please improve.

+8
Jul 20 '16 at 9:46
source share
 ^[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[az]{2,7}$ 

[domain - lowercase letters and only 0-9] [may have a hyphen] + [only for lowercase, should be 2 to 7 letters long]
http://rubular.com/ is brilliant for testing regular expressions!
Edit: Updated TLD to a maximum of 7 characters for ".rentals", as Dan Caddigan pointed out.

+5
May 23 '13 at 13:27
source share

Not enough reputation for comment. In response to paka's solution, I found that I needed to configure three elements:

  • Characters and underscores have been moved because the dash is interpreted as a range (as in "0-9").
  • Added full stop for domain names with many subdomains
  • Increased potential length for TLD to 13

Before:

 ^(([a-zA-Z]{1})|([a-zA-Z]{1}[a-zA-Z]{1})|([a-zA-Z]{1}[0-9]{1})|([0-9]{1}[a-zA-Z]{1})|([a-zA-Z0-9][a-zA-Z0-9-_]{1,61}[a-zA-Z0-9]))\.([a-zA-Z]{2,6}|[a-zA-Z0-9-]{2,30}\.[a-zA-Z]{2,3})$ 

After:

 ^(([a-zA-Z]{1})|([a-zA-Z]{1}[a-zA-Z]{1})|([a-zA-Z]{1}[0-9]{1})|([0-9]{1}[a-zA-Z]{1})|([a-zA-Z0-9][-_\.a-zA-Z0-9]{1,61}[a-zA-Z0-9]))\.([a-zA-Z]{2,13}|[a-zA-Z0-9-]{2,30}\.[a-zA-Z]{2,3})$ 
+5
Jul 03 '14 at 11:41
source share
 ^[a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9]\.[a-zA-Z]+(\.[a-zA-Z]+)$ 
+2
Apr 24 '12 at 22:10
source share
 ^((localhost)|((?!-)[A-Za-z0-9-]{1,63}(?<!-)\.)+[A-Za-z]{2,253})$ 

Thanks @mkyong for the basis for my answer. I modified it to support more acceptable tags.

In addition, "localhost" is a technically valid domain name. I will modify this answer to accommodate internationalized domain names.

+2
Aug 05 '15 at 2:54 on
source share

For new gTLDs

 /^((?!-)[\p{L}\p{N}-]+(?<!-)\.)+[\p{L}\p{N}]{2,}$/iu 
+2
Mar 11 '16 at 9:14
source share

Here is the complete code with an example:

 <?php function is_domain($url) { $parse = parse_url($url); if (isset($parse['host'])) { $domain = $parse['host']; } else { $domain = $url; } return preg_match('/^(?!\-)(?:[a-zA-Z\d\-]{0,62}[a-zA-Z\d]\.){1,126}(?!\d+)[a-zA-Z\d]{1,63}$/', $domain); } echo is_domain('example.com'); //true echo is_domain('https://example.com'); //true echo is_domain('https://.example.com'); //false echo is_domain('https://localhost'); //false 
+1
Jun 27 '17 at 12:05
source share
 /^((([a-zA-Z]{1,2})|([0-9]{1,2})|([a-zA-Z0-9]{1,2})|([a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9]))\.)+[a-zA-Z]{2,6}$/ 
  • ([a-zA-Z]{1,2}) → to accept only two characters.

  • ([0-9]{1,2}) → to accept only two numbers

if something exceeds two ([a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9]) , this regular expression will take care of this.

If we want to match at least once + will be used.

0
Apr 2 '15 at 10:34
source share

^ [A-Za-Z0-9] [- A-Za-Z0-9] + [A-Za-Z0-9] [AZ] {2,3} ([AZ] {2, .. 3}) ? (. [AZ] {2,3})? $

Examples that work:

 stack.com sta-ck.com sta---ck.com 9sta--ck.com sta--ck9.com stack99.com 99stack.com sta99ck.com 

It will also work for extensions.

 .com.uk .co.in .uk.edu.in 

Examples that will not work:

 -stack.com 

it will work even with the longest ".versicherung" domain extension

0
Jun 12 '15 at 13:43 on
source share

As already indicated, it is not obvious to say subdomains in a practical sense. We use this regular expression to check for domains that are found in the wild. It covers all practical use cases that I know of. New ones are welcome. According to our recommendations, this avoids capture groups and greedy matching.

^(?!.*?_.*?)(?!(?:[\d\w]+?\.)?\-[\w\d\.\-]*?)(?![\w\d]+?\-\.(?:[\d\w\.\-]+?))(?=[\w\d])(?=[\w\d\.\-]*?\.+[\w\d\.\-]*?)(?![\w\d\.\-]{254})(?!(?:\.?[\w\d\-\.]*?[\w\d\-]{64,}\.)+?)[\w\d\.\-]+?(?<![\w\d\-\.]*?\.[\d]+?)(?<=[\w\d\-]{2,})(?<![\w\d\-]{25})$

Proof and explanation: https://regex101.com/r/FLA9Bv/9

When checking domains, you can choose one of two approaches.

Compliance with a fully qualified domain name (theoretical definition rarely found in practice):

Practical / conservative FQDN mapping (practical definition, expected and supported in practice):

  • matching books with the following exceptions / additions
  • valid characters are: [a-zA-Z0-9.-]
  • tags cannot begin or end with hyphens (according to RFC-952 and RFC-1123 / 2.1 )
  • The minimum TLD length is 2 characters, the maximum length is 24 characters in accordance with the current records.
  • does not match endpoint
0
Jul 21 '19 at 0:06
source share



All Articles