PHP URL analysis and analysis

  • www.example.com
  • foo.example.com
  • foo.example.co.uk
  • foo.bar.example.com
  • foo.bar.example.co.uk

I have this URL here and want you to always have 2 variables:

$domainName = "example" $domainNameSuffix = ".com" OR ".co.uk" 

If someone could get me from $ url, which is one of the URLs, up to $ newUrl close to "example.co.uk", that would be a blessing.

Please note that the URLs will be completely "random", we can get "foo.bar.example2.com.au", so ... you know ... pah. (request for impossibility?)

Greetings

+2
source share
5 answers

"domainNameSuffix" is called a top-level domain (tld for short) and there is no easy way to extract it.

Each country has its own rights, and some countries have decided to further divide them. And since the number of subdomains (my.own.subdomain.example.com) is also variable, there is no simple "one-regexp-fits-all".

As already mentioned, you need a list. Fortunately, there are publicly available lists for you: http://publicsuffix.org/

+2
source

We had a few questions like before, but I also canโ€™t find a good one. The bottom line is that this cannot be done reliably. You will need a long list of special TLDs (such as .uk and .au) that have their own .com / .net level.

But as a general approach and a simple solution, you can use:

 preg_match('#([\w-]+)\.(\w+(\.(au|uk))?)\.?$#i', $domain, $m); list(, $domain, $suffix) = $m; 
+3
source

You will need to maintain a list of extensions for the most accurate results that I consider.

 $possibleExtensions = array( '.com', '.co.uk', '.com.au' ); // parse_url() needs a protocol. $str = 'http://' . $str; // Use parse_url() to take into account any paths // or fragments that may end up being there. $host = parse_url($str, PHP_URL_HOST); foreach($possibleExtensions as $ext) { if (preg_match('/' . preg_quote($ext, '/') . '\Z/', $host)) { $domainNameSuffix = $ext; // Strip extension $domainName = substr($str, 0, -strlen($ext)); // Strip off http:// $domainName = substr($domainName, 7); var_dump($domainName, $domainNameSuffix); break; } } 

If you have never had any paths or additional materials, you can, of course, skip adding and removing parse_url() and http:// .

It worked for all of your tests .

+2
source

There is no built-in function for this.

A quick google search will lead me to http://www.wallpaperama.com/forums/php-function-remove-domain-name-get-tld-splitter-split-t5824.html

This makes me think that you need to maintain a list of valid TLDs to separate URLs.

0
source

Ok, thatโ€™s how I decided it now. More domain names will also be implemented, at some point in the future. I donโ€™t know what technique I will use.

 # Setting options, single and dual part domain extentions $v2_onePart = array( "com" ); $v2_twoPart = array( "co.uk", "com.au" ); $v2_url = $_SERVER['SERVER_NAME']; # "example.com" OR "example.com.au" $v2_bits = explode(".", $v2_url); # "example", "com" OR "example", "com", "au" $v2_bits = array_reverse($v2_bits); # "com", "example" OR "au", "com", "example" (Reversing to eliminate foo.bar.example.com.au problems.) switch ($v2_bits) { case in_array($v2_bits[1] . "." . $v2_bits[0], $v2_twoPart): $v2_class = $v2_bits[2] . " " . $v2_bits[1] . "_" . $v2_bits[0]; # "example com_au" break; case in_array($v2_bits[0], $v2_onePart): $v2_class = $v2_bits[1] . " " . $v2_bits[0]; # "example com" break; } 
0
source

All Articles