Check URL with or without protocol

Hey. I would like to check these following URLs, so all of them will go with or without the http / www part in them while the TLD is present, e.g. .com, .net, .org, etc.

Valid URLs must be:

http://www.domain.com http://domain.com https://www.domain.com https://domain.com www.domain.com domain.com 

To support long tlds:

 http://www.domain.com.uk http://domain.com.uk https://www.domain.com.uk https://domain.com.uk www.domain.com.uk domain.com.uk 

To support dashes (-):

 http://www.domain-here.com http://domain-here.com https://www.domain-here.com https://domain-here.com www.domain-here.com domain-here.com 

Also to support domain numbers:

 http://www.domain1-test-here.com http://domain1-test-here.com https://www.domain1-test-here.com https://domain1-test-here.com www.domain1-test-here.com domain-here.com 

It is also possible to allow even IP addresses:

 127.127.127.127 

(but this is optional!)

Also allow dash (-), forget that =)

I found many functions that test one or the other, but not both at the same time. If someone knows a good regex for this, please share. Thank you for your help.

+4
source share
4 answers

For the correct URL validation solution.

The following answer is correct, but does not work in all domains, such as .me, .it, .in

please please below for url:

 $pattern = '/(?:https?:\/\/)?(?:[a-zA-Z0-9.-]+?\.(?:[a-zA-Z])|\d+\.\d+\.\d+\.\d+)/'; if(preg_match($pattern, "http://website.in")) { echo "valid"; }else{ echo "invalid"; } 
+4
source

When you ignore part of the path and only look for part of the domain, a simple rule would be

 (?:https?://)?(?:[a-zA-Z0-9.-]+?\.(?:com|net|org|gov|edu|mil)|\d+\.\d+\.\d+\.\d+) 

If you also want to support country TLDs, you must either provide a complete (current) list or add |.. to the TLD part.

With preg_match you have to wrap it between some delimiters

 $pattern = ';(?:https?://)?(?:[a-zA-Z0-9.-]+?\.(?:com|net|org|gov|edu|mil)|\d+\.\d+\.\d+\.\d+);'; $index = preg_match($pattern, $url); 

Usually you use / . But in this case, slashes are part of the pattern, so I chose a different delimiter. Otherwise, I have to escape the slash with \

 $pattern = '/(?:https?:\/\/)?(?:[a-zA-Z0-9.-]+?\.(?:com|net|org|gov|edu|mil)|\d+\.\d+\.\d+\.\d+)/'; 
+1
source

I think you can use flags for filter_vars .

Several flags are available for FILTER_VALIDATE_URL :

  • FILTER_FLAG_SCHEME_REQUIRED Requires the URL to contain the part scheme.
  • FILTER_FLAG_HOST_REQUIRED required to contain the host part.
  • FILTER_FLAG_PATH_REQUIRED URL must contain the path part.
  • FILTER_FLAG_QUERY_REQUIRED URL must contain a query string.

FILTER_FLAG_SCHEME_REQUIRED and FILTER_FLAG_HOST_REQUIRED are used by default.

Suppose you want to check part of a path and don’t want to check part of a circuit, you can do something like this (falg is a bitmask):

 filter_var($url, FILTER_VALIDATE_URL, ~FILTER_FLAG_SCHEME_REQUIRED | FILTER_FLAG_PATH_REQUIRED) 
+1
source

Do not use regex. Not every problem that includes strings should use regular expressions.

Do not write your own URL validator. Validating the URL is a solvable problem, and there is already existing code that has already been written, debugged, and tested. It actually comes with PHP.

Take a look at PHP's built-in filtering functions: http://us2.php.net/manual/en/book.filter.php

0
source

All Articles