If you want to remove the part of the domain administered by domain name registrars, you will need to use a list of suffixes such as the public suffix List .
But since walking through this list and checking the suffix for a domain name is not so effective, rather use this list only to create an index like this:
$tlds = array( // ac : http://en.wikipedia.org/wiki/.ac 'ac', 'com.ac', 'edu.ac', 'gov.ac', 'net.ac', 'mil.ac', 'org.ac', // ad : http://en.wikipedia.org/wiki/.ad 'ad', 'nom.ad', // … ); $tldIndex = array_flip($tlds);
Finding the best match will be as follows:
$levels = explode('.', $domain); for ($length=1, $n=count($levels); $length<=$n; ++$length) { $suffix = implode('.', array_slice($levels, -$length)); if (!isset($tldIndex[$suffix])) { $length--; break; } } $suffix = implode('.', array_slice($levels, -$length)); $prefix = substr($domain, 0, -strlen($suffix) - 1);
Or create a tree that represents a hierarchy of domain name levels as follows:
$tldTree = array( // ac : http://en.wikipedia.org/wiki/.ac 'ac' => array( 'com' => true, 'edu' => true, 'gov' => true, 'net' => true, 'mil' => true, 'org' => true, ), // ad : http://en.wikipedia.org/wiki/.ad 'ad' => array( 'nom' => true, ), // … );
Then you can use the following to find a match:
$levels = explode('.', $domain); $r = &$tldTree; $length = 0; foreach (array_reverse($levels) as $level) { if (isset($r[$level])) { $r = &$r[$level]; $length++; } else { break; } } $suffix = implode('.', array_slice($levels, - $length)); $prefix = substr($domain, 0, -strlen($suffix) - 1);