How can I normalize / collapse paths or urls in Python regardless of OS?

I tried using os.normpath to convert http://example.com/a/b/c/../ to http://example.com/a/b/ , but it doesn’t work on Windows because it converts a slash to a backslash.

+5
python url path normalize
Jan 25 '10 at 9:30
source share
3 answers

Here is how to do it

 >>> import urlparse >>> urlparse.urljoin("ftp://domain.com/a/b/c/d/", "../..") 'ftp://domain.com/a/b/' >>> urlparse.urljoin("ftp://domain.com/a/b/c/d/e.txt", "../..") 'ftp://domain.com/a/b/' 

Remember that urljoin looks at the path / directory to the last / - after that it is the name of the file, if any.

Also, do not add the lead / to the second parameter, otherwise you will not get the expected result.

os.path module is platform dependent, but only slashes are used for file paths, but not URLs that you could use posixpath,normpath .

+9
Jan 25 '10 at 9:33
source share

adopted from the os module "- os.path is one of the posixpath or ntpath modules", in your case it explicitly uses posixpath.

  >>> import posixpath >>> posixpath.normpath("/a/b/../c") '/a/c' >>> 
+2
Jan 25 '10 at 9:37
source share

Neither urljoin nor posixpath.normpath are executed properly . urljoin forces you to join something and does not cope with absolute paths or excessive .. correctly. posixpath.normpath collapses multiple slashes and removes trailing slashes, both of which should not make URLs.




The following function resolves URLs completely, handling as . and .. s, in accordance with RFC 3986 .

 try: # Python 3 from urllib.parse import urlsplit, urlunsplit except ImportError: # Python 2 from urlparse import urlsplit, urlunsplit def resolve_url(url): parts = list(urlsplit(url)) segments = parts[2].split('/') segments = [segment + '/' for segment in segments[:-1]] + [segments[-1]] resolved = [] for segment in segments: if segment in ('../', '..'): if resolved[1:]: resolved.pop() elif segment not in ('./', '.'): resolved.append(segment) parts[2] = ''.join(resolved) return urlunsplit(parts) 

You can then call it the full URL as follows.

 >>> resolve_url("http://example.com/dir/../../thing/.") 'http://example.com/thing/' 

For more information on the considerations you need to make when resolving URLs, see the similar answer I wrote earlier on this .

+2
Nov 10 '16 at 21:19
source share



All Articles