Change regex to allow IP address when validating url?

I have the following regex to check if the URL is valid:

preg_match('/^(http(s?):\/\/)?(www\.)?+[a-zA-Z0-9\.\-\_]+(\.[a-zA-Z]{2,3})+(\/[a-zA-Z0-9\_\-\s\.\/\?\%\#\&\=]*)?$/i', $url); 

I like to modify this part of [a-zA-Z0-9\.\-\_]+(\.[a-zA-Z]{2,3}) (at least I hope this bold part) will be either an IP address or this highlighted part.

Currently, regex is pretty good for me, as it finds the wrong URLs correctly, although I believe that it will start to work incorrectly as soon as the new domain policy from ICANN enters the network (i.e.Google may want the http url : //search.google - instead of http://google.com for search)

Anyway, I would like to add the ability to allow IP addresses to also be a valid URL, but I'm not sure how this affects the regex

If someone can lend a hand, then it will be great!

+8
url regex
source share
2 answers

This regex works:

 ^(http(s?):\/\/)?(((www\.)?+[a-zA-Z0-9\.\-\_]+(\.[a-zA-Z]{2,3})+)|(\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b))(\/[a-zA-Z0-9\_\-\s\.\/\?\%\#\&\=]*)?$ 

In the section after checking the "http", it simply performs an OR operation to match either the domain name or IP. Here is the relevant passage:

 ((www\.)?+[a-zA-Z0-9\.\-\_]+(\.[a-zA-Z]{2,3})+)|(\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b) 

The IP expression is somewhat longer, but it ensures that it is a valid IP address (like, not 999.999.999.999 ). You can easily replace it with another IP check.

Here it is included in your previous code:

 preg_match('/^(http(s?):\/\/)?(((www\.)?+[a-zA-Z0-9\.\-\_]+(\.[a-zA-Z]{2,3})+)|(\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b))(\/[a-zA-Z0-9\_\-\s\.\/\?\%\#\&\=]*)?$/i', $url); 
+8
source share

Two points. The top level of domains now seems to be a maximum of 6 characters (museum), so we need to consider this:

 ^(http(s?):\/\/)?(((www\.)?+[a-zA-Z0-9\.\-\_]+(\.[a-zA-Z]{2,6})+)|(\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b))(\/[a-zA-Z0-9\_\-\s\.\/\?\%\#\&\=]*)?$ 

In C languages, we need to avoid these \

 char *regex = "/^(http(s?):\\/\\/)?(www\\.)?+[a-zA-Z0-9\\.\\-\\_]+(\\.[a-zA-Z]{2,6})+(\\/[a-zA-Z0-9\\_\\-\\s\\.\\/\\?\\%\\#\\&\\=]*)?$/i"; 

In object C, we can define the Method category on NSString:

 - (BOOL)isURL { // uses ICU regex syntax http://userguide.icu-project.org/strings/regexp NSString *regex = @"^(http(s?)://)?(www\\.)?+[a-zA-Z0-9\\.\\-_]+(\\.[a-zA-Z]{2,6})+(/[a-zA-Z0-9_\\-\\s\\./\\?%#\\&=]*)?$"; NSPredicate *regextest = [NSPredicate predicateWithFormat:@"SELF MATCHES %@", regex]; return [regextest evaluateWithObject:self]; } 

Please note that this solution completely ignores IPv6!

+2
source share

All Articles