Change hostname in url

I am trying to use python to change the hostname in the url and have been playing with the urlparse module for some time without finding a satisfactory solution. As an example, consider the URL:

https://www.google.dk:80/barbaz

I would like to replace "www.google.dk" for example. "www.foo.dk", so I get the following URL:

https://www.foo.dk:80/barbaz .

So, the part I want to replace is that urlparse.urlsplit refers to the host name. I was hoping the urlsplit result would allow me to make changes, but the resulting ParseResult type does not allow me to do this. If I still can not, of course, restore the new url by adding all the parts along with +, but this will leave me with a rather ugly code with a lot of conditional expressions to get ": //" and ":" in the right places.

+32
python url
Feb 07 '14 at 13:20
source share
7 answers

You can use urlparse.urlparse and the ParseResult._replace method:

 >>> import urlparse >>> parsed = urlparse.urlparse("https://www.google.dk:80/barbaz") >>> replaced = parsed._replace(netloc="www.foo.dk:80") >>> print replaced ParseResult(scheme='https', netloc='www.foo.dk:80', path='/barbaz', params='', query='', fragment='') 

ParseResult is a subclass of namedtuple and _replace is the namedtuple method, which:

returns a new instance of a named tuple that replaces the specified fields with new values

UPDATE

The port number is used as the @ 2rs2ts attribute in the comment netloc attribute.

Good news: ParseResult has hostname and port attributes. The bad news is: hostname and port are not members of namedtuple , they are dynamic properties, and you cannot do parsed._replace(hostname="www.foo.dk") . This will throw an exception.

If you do not want to break into : and your url always has a port number and does not have username and password (which refers as https: // username: password@www.google.dk: 80 / barbaz ") you can:

 parsed._replace(netloc="{}:{}".format(parsed.hostname, parsed.port)) 
+60
Feb 07 '14 at 13:34
source share

You can use urlsplit and urlunsplit from Python urlparse :

 >>> from urlparse import urlsplit, urlunsplit >>> url = list(urlsplit('https://www.google.dk:80/barbaz')) >>> url ['https', 'www.google.dk:80', '/barbaz', '', ''] >>> url[1] = 'www.foo.dk:80' >>> new_url = urlunsplit(url) >>> new_url 'https://www.foo.dk:80/barbaz' 

Like the docs state, the argument passed to urlunsplit() can be any iterable with five points, so the above code works as expected.

+15
Feb 07 '14 at 1:36 on
source share

Using the urlparse and urlunparse methods of the urlunparse module:

 import urlparse old_url = 'https://www.google.dk:80/barbaz' url_lst = list(urlparse.urlparse(old_url)) # Now url_lst is ['https', 'www.google.dk:80', '/barbaz', '', '', ''] url_lst[1] = 'www.foo.dk:80' # Now url_lst is ['https', 'www.foo.dk:80', '/barbaz', '', '', ''] new_url = urlparse.urlunparse(url_lst) print(old_url) print(new_url) 

Output:

 https://www.google.dk:80/barbaz https://www.foo.dk:80/barbaz 
+5
Feb 07 '14 at
source share

A simple host string replacement in netloc also works in most cases:

 >>> p = urlparse.urlparse('https://www.google.dk:80/barbaz') >>> p._replace(netloc=p.netloc.replace(p.hostname, 'www.foo.dk')).geturl() 'https://www.foo.dk:80/barbaz' 

This will not work if, by chance, the username or password matches the host name. You cannot restrict str.replace to replace only the last occurrence, so instead we can use split and join:

 >>> p = urlparse.urlparse('https://www.google.dk:www.google.dk@www.google.dk:80/barbaz') >>> new_netloc = 'www.foo.dk'.join(p.netloc.rsplit(p.hostname, 1)) >>> p._replace(netloc=new_netloc).geturl() 'https://www.google.dk:www.google.dk@www.foo.dk:80/barbaz' 
+2
Jun 15 '15 at 13:37
source share

I would recommend using urlsplit and urlunsplit as urlunsplit 's answer as well, but for Python3 this would be:

 >>> from urllib.parse import urlsplit, urlunsplit >>> url = list(urlsplit('https://www.google.dk:80/barbaz')) >>> url ['https', 'www.google.dk:80', '/barbaz', '', ''] >>> url[1] = 'www.foo.dk:80' >>> new_url = urlunsplit(url) >>> new_url 'https://www.foo.dk:80/barbaz' 
+1
Dec 18 '17 at 2:53 on
source share

You can always do this trick:

 >>> p = parse.urlparse("https://stackoverflow.com/questions/21628852/changing-hostname-in-a-url") >>> parse.ParseResult(**dict(p._asdict(), netloc='perrito.com.ar')).geturl() 'https://perrito.com.ar/questions/21628852/changing-hostname-in-a-url' 
0
Dec 21 '18 at 23:10
source share

To simply replace the host without touching the port used (if any), use this:

 import re, urlparse p = list(urlparse.urlsplit('https://www.google.dk:80/barbaz')) p[1] = re.sub('^[^:]*', 'www.foo.dk', p[1]) print urlparse.urlunsplit(p) 

prints

 https://www.foo.dk:80/barbaz 

If you did not give any port, this also works great.

If you prefer the _replace method that Nigel pointed out, you can use this instead:

 p = urlparse.urlsplit('https://www.google.dk:80/barbaz') p = p._replace(netloc=re.sub('^[^:]*', 'www.foo.dk', p.netloc)) print urlparse.urlunsplit(p) 
-one
Feb 07
source share



All Articles