Perl Regex to Get Root Domain URLs

How can I get part of the url?

For instance:

http://www.facebook.com/xxxxxxxxxxx http://www.stackoverflow.com/yyyyyyyyyyyyyyyy 

I need to take only this part:

 facebook.com stackoverflow.com 
+3
source share
6 answers

Just some simple things to regular expressions.

 $facebook = "www.facebook.com/xxxxxxxxxxx"; $facebook =~ s/www\.(.*\.com).*/$1/; # get what is between www. and .com print $facebook; 

Returns

 facebook.com 

You can also do this work for .net , .org , etc. Sort of:

 s/www\.(.*\.(?:net|org|com)).*/$1/; 
-1
source
 use feature qw( say state ); use Domain::PublicSuffix qw( ); use URI qw( ); # Returns "domain.tld" for "subdomain.domain.tld". # Handles multi-level TLDs such as ".co.uk". sub root_domain { my ($domain) = @_; state $parser = Domain::PublicSuffix->new(); return $parser->get_root_domain($domain); } # Accepts urls as strings and as URI objects. sub url_root_domain { my ($abs_url) = @_; my $domain = URI->new($abs_url)->host(); return root_domain($domain); } say url_root_domain('http://www.facebook.com/'); # facebook.com say url_root_domain('https://www.facebook.com/'); # facebook.com say url_root_domain('http://mobile.google.com/'); # google.com say url_root_domain('http://www.theregister.co.uk/'); # theregister.co.uk say url_root_domain('http://www.com/'); # www.com 
+11
source

I like the answer of the URI. The OP requested a regular expression, so in honor of the request and as a challenge, here is the answer I came up with. In fairness, sometimes installing CPAN modules is not easy or possible. I worked on some projects that are hardened using a very specific version of Perl, and only certain modules are allowed.

Here is my attempt at a regular expression response. Please note that www. not required. Subdomains such as mobile. are performed. Search / not greedy, so the URL with directories at the end will be parsed correctly. I am protocol independent; it can be http, https, file, sftp . The output is recorded at $1 .

 ^.*://(?:[wW]{3}\.)?([^:/]*).*$ 

Input Example:

 http://WWW.facebook.com:80/ http://facebook.com/xxxxxxxxxxx/aaaaa http://www.stackoverflow.com/yyyyyyyyyyyyyyyy/aaaaaaa https://mobile.yahoo.com/yyyyyyyyyyyyyyyy/aaaaaaa http://www.theregister.co.uk/ 

Output Example:

 facebook.com facebook.com stackoverflow.com mobile.yahoo.com theregister.co.uk 

EDIT: Thanks @ikegami for the extra task. :) Now it supports WWW in any mixed case and port number, for example :80 .

+2
source

It might be helpful ...

^https?:\/\/www\.([\da-zA-Z\.-]+)

Input Example:

 http://www.banglanews24.com/detailsnews.php nssl=763daee77dc90b1c1baf0a361be2ff3c&nttl=20130416072403189462 http://www.prothom-alo.com/detail/date/2013-04-20/news/3463 http://www.facebook.com/xxxxxxxxxxx http://www.stackoverflow.com/yyyyyyyyyyyyyyy 

Output Example:

 banglanews24.com prothom-alo.com facebook.com stackoverflow.com 
+2
source

I found a way:

 my @urls = qw( http://www.facebook.com http://www.sadas.com/ ); for my $url (@urls) { $url =~ s/^https?:(?:www\.)?//ig; $url =~ s{/.*}{}; print "$url\n"; } 
0
source
 $a="http://www.stackoverflow.com/yyyyyyyyyyyyyyyy"; if($a=~/\/\/\w+\.(.*)\// ) { print $1; } else { print "false"; } 
0
source

All Articles