Extract top domain from php string

I need to extract a domain name from a string, which can be anything. For example:

$sitelink="http://www.somewebsite.com/product/3749875/info/overview.html"; 

or

 $sitelink="http://subdomain.somewebsite.com/blah/blah/whatever.php"; 

In any case, I want to extract the "someebsite.com" part (which could be anything) and drop the rest.

+9
string php
source share
5 answers

With parse_url ($ url)

 <?php $url = 'http://username:password@hostname/path?arg=value#anchor'; print_r(parse_url($url)); ?> The above example will output: Array ( [scheme] => http [host] => hostname [user] => username [pass] => password [path] => /path [query] => arg=value [fragment] => anchor ) 

Using thos values

 echo parse_url($url, PHP_URL_HOST); //hostname 

or

 $url_info = parse_url($url); echo $url_info['host'];//hostname 
+14
source share

here

 <?php $sitelink="http://www.somewebsite.com/product/3749875/info/overview.html"; $domain_pieces = explode(".", parse_url($sitelink, PHP_URL_HOST)); $l = sizeof($domain_pieces); $secondleveldomain = $domain_pieces[$l-2] . "." . $domain_pieces[$l-1]; echo $secondleveldomain; 

note that this is probably not the behavior you are looking for, because for hosts like

 stackoverflow.co.uk 

he will echo "co.uk"


cm

http://publicsuffix.org/learn/

http://www.dkim-reputation.org/regdom-libs/

http://www.dkim-reputation.org/regdom-lib-downloads/ <- loaded here, including php

+5
source share

2 complex URLs

 $url="https://www.example.co.uk/page/section/younameit"; or $url="https://example.co.uk/page/section/younameit"; 

To get "www.example.co.uk":

 $host=parse_url($url, PHP_URL_HOST); 

To get only example.co.uk

 $parts = explode('www.',$host); $domain = $parts[1]; // ...or... $domain = ltrim($host, 'www.') 

If your URL includes "www." or not, you get the same end result, that is, "example.co.uk"

Voila!

+1
source share

You need a package that uses the Public Suffix List , only in this way you can correctly extract domains with two third-level domains (co.uk, a.bg, b.bg, etc.) and multi-level subdomains. Regex, parse_url () or string functions will never produce an absolutely correct result.

I recommend using TLD Extract . Here is a sample code:

 $extract = new LayerShifter\TLDExtract\Extract(); $result = $extract->parse('http://www.somewebsite.com/product/3749875/info/overview.html'); $result->getSubdomain(); // will return (string) 'www' $result->getHostname(); // will return (string) 'somewebsite' $result->getSuffix(); // will return (string) 'com' $result->getRegistrableDomain(); // will return (string) 'somewebsite.com' 
+1
source share

For a string that could be anything, a new approach:

 function extract_plain_domain($text) { $text=trim($text,"/"); $text=strtolower($text); $parts=explode("/",$text); if (substr_count($parts[0],"http")) { $parts[0]=""; } reset ($parts);while (list ($key, $val) = each ($parts)) { if (!empty($val)) { $text=$val; break; } } $parts=explode(".",$text); if (empty($parts[2])) { return $parts[0].".".$parts[1]; } else { $num_parts=count($parts); return $parts[$num_parts-2].".".$parts[$num_parts-1]; } } // end function extract_plain_domain 
0
source share

All Articles