Domain Regular Expression Separation

Question

Domain Regular Expression Separation

I have some domains that I want to split, but cannot define a regex ...

I have:

http://www.google.com/tomato
http://int.google.com
http://google.co.uk

For any of them I am trying to extract only google . Any ideas?

0

php regex

David19801 Feb 10 '11 at 21:54

source share

4 answers

why are you trying to use regex? many of your own functions are available for you, such as:

 $host = parse_url($url, PHP_URL_HOST);

upgrade, let it go, may need to improve, but better than regex imo

 function determainDomainName($url) { $hostname = parse_url($url, PHP_URL_HOST); $parts = explode(".",$hostname); switch(count($parts)) { case 1: return $parts[0]; //has to be a .com etc break; case 2: if($parts[1] == "www") //The most common subdomain { return $parts[2]; //Bypass Subdomain / return next segment } if($parts[2] == "co") //Possible in_array here for multiples, but first segment of double barrel tld { return $parts[1]; //Bypass double barrel tld's } break; default: //Have a guess //I bet the longest word is the domain :) usort($parts,"mysort"); return $parts[0]; /* here we just order the array by the longest word so google will always come above the following com,co,uk,www,cdn,ww1,ww2 etc */ break; } } function mysort($a,$b){ return strlen($b) - strlen($a); }

Add the following 2 functions to your libraries, etc.

Then use like this:

 $urls = array( 'http://www.google.com/tomato', 'http://int.google.com', 'http://google.co.uk' ); foreach($urls as $url) { echo determainDomainName($url) . "\n"; }

They all will google echo

see @ http://codepad.org/pA5KWckb

+3

RobertPitt Feb 10 '11 at 10:01

source share

The answer here may be what you are looking for.

Retrieving URL Parts (Regex)

0

aendrew Feb 10 '11 at 10:01

source share

  $ res = preg_replace ("/^(http:\/\/)([a-z_\-†+\.)*([a-z_\-†+)\.((||co.uk|net) \ /.*$/ im "," \ $ 3 ", $ in);

Add as many endings as possible

Edit: made a mistake: - (

0

SergeS Feb 10 '11 at 10:02

source share

mario · Accepted Answer · 2011-02-10T22:15:32+0000

You can do this based on your best bet. The last part of the URL is always the TLD (and the additional root). And you are basically looking for any previous word that is longer than 2 letters:

 $url = "http://www.google.co.uk./search?q=.."; preg_match("#http:// (?:[^/]+\.)* # cut off any preceeding www* ([\w-]{3,}) # main domain name (\.\w\w)? # two-letter second level domain .co \.\w+\.? # TLD (/|:|$) # end regex with / or : or string end #x", $url, $match);

If you expect longer second level domains (.com, maybe?), Add another \w . But this is not very general, you really need a list for TLD, if that were allowed.

Domain Regular Expression Separation

More articles: