Neither urljoin nor posixpath.normpath are executed properly . urljoin forces you to join something and does not cope with absolute paths or excessive .. correctly. posixpath.normpath collapses multiple slashes and removes trailing slashes, both of which should not make URLs.
The following function resolves URLs completely, handling as . and .. s, in accordance with RFC 3986 .
try: # Python 3 from urllib.parse import urlsplit, urlunsplit except ImportError: # Python 2 from urlparse import urlsplit, urlunsplit def resolve_url(url): parts = list(urlsplit(url)) segments = parts[2].split('/') segments = [segment + '/' for segment in segments[:-1]] + [segments[-1]] resolved = [] for segment in segments: if segment in ('../', '..'): if resolved[1:]: resolved.pop() elif segment not in ('./', '.'): resolved.append(segment) parts[2] = ''.join(resolved) return urlunsplit(parts)
You can then call it the full URL as follows.
>>> resolve_url("http://example.com/dir/../../thing/.") 'http://example.com/thing/'
For more information on the considerations you need to make when resolving URLs, see the similar answer I wrote earlier on this .
obskyr Nov 10 '16 at 21:19 2016-11-10 21:19
source share