Domain Regular Expression Separation

I have some domains that I want to split, but cannot define a regex ...

I have:

  • http://www.google.com/tomato
  • http://int.google.com
  • http://google.co.uk

For any of them I am trying to extract only google . Any ideas?

0
source share
4 answers

You can do this based on your best bet. The last part of the URL is always the TLD (and the additional root). And you are basically looking for any previous word that is longer than 2 letters:

 $url = "http://www.google.co.uk./search?q=.."; preg_match("#http:// (?:[^/]+\.)* # cut off any preceeding www* ([\w-]{3,}) # main domain name (\.\w\w)? # two-letter second level domain .co \.\w+\.? # TLD (/|:|$) # end regex with / or : or string end #x", $url, $match); 

If you expect longer second level domains (.com, maybe?), Add another \w . But this is not very general, you really need a list for TLD, if that were allowed.

0
source

why are you trying to use regex? many of your own functions are available for you, such as:

 $host = parse_url($url, PHP_URL_HOST); 

upgrade, let it go, may need to improve, but better than regex imo

 function determainDomainName($url) { $hostname = parse_url($url, PHP_URL_HOST); $parts = explode(".",$hostname); switch(count($parts)) { case 1: return $parts[0]; //has to be a .com etc break; case 2: if($parts[1] == "www") //The most common subdomain { return $parts[2]; //Bypass Subdomain / return next segment } if($parts[2] == "co") //Possible in_array here for multiples, but first segment of double barrel tld { return $parts[1]; //Bypass double barrel tld's } break; default: //Have a guess //I bet the longest word is the domain :) usort($parts,"mysort"); return $parts[0]; /* here we just order the array by the longest word so google will always come above the following com,co,uk,www,cdn,ww1,ww2 etc */ break; } } function mysort($a,$b){ return strlen($b) - strlen($a); } 

Add the following 2 functions to your libraries, etc.

Then use like this:

 $urls = array( 'http://www.google.com/tomato', 'http://int.google.com', 'http://google.co.uk' ); foreach($urls as $url) { echo determainDomainName($url) . "\n"; } 

They all will google echo

see @ http://codepad.org/pA5KWckb

+3
source

The answer here may be what you are looking for.

Retrieving URL Parts (Regex)

0
source
  $ res = preg_replace ("/^(http:\/\/)([a-z_\-†+\.)*([a-z_\-†+)\.((||co.uk|net) \ /.*$/ im "," \ $ 3 ", $ in);

Add as many endings as possible

Edit: made a mistake: - (

0
source

All Articles