Combining URLs with urlunparse

I am writing something to "clear" the URL. In this case, all I'm trying to do is return the fake schema, since urlopen would not work without it. However, if I test this with www.python.org , it will return http:///www.python.org . Does anyone know why an extra /, and is there a way to get this back without it?

 def FixScheme(website): from urlparse import urlparse, urlunparse scheme, netloc, path, params, query, fragment = urlparse(website) if scheme == '': return urlunparse(('http', netloc, path, params, query, fragment)) else: return website 
+6
python urlparse
source share
2 answers

The problem is that when parsing a very incomplete URL www.python.org line you give is actually accepted as a component of the path URL, with the empty netloc (network location) line as well as the scheme. To default the scheme, you can actually pass the second parameter of the scheme to urlparse (simplifying your logic), but this does not help with the "empty netloc" problem. Therefore, you need logic for this case, for example.

 if not netloc: netloc, path = path, '' 
+8
source share

This is because urlparse does not interpret “www.python.org” as a host name (netloc), but as a path, like a browser, if it encounters this line in the href attribute. Then urlunparse seems to interpret the http scheme specifically. If you put "x" as the schema, you get "x: www.python.org".

I don’t know what input range you are dealing with, but it looks like you may not need urlparse and urlunparse.

0
source share

All Articles