How to join path components when creating url in Python

Question

How to join path components when creating url in Python

For example, I want to join the prefix path to the resource path, for example /js/foo.js.

I want the resulting path to be relative to the server root. In the above example, if the prefix was a "carrier", I would like the result to be /media/js/foo.js.

os.path.join does this really well, but how it connects the paths depends on the OS. In this case, I know that I am targeting the Internet, not the local file system.

Is there a better alternative when you work with paths that you know will be used in URLs? Will os.path.join work well enough? Do I just have to roll mine?

+83

python url

amjoconn Nov 24 '09 at 22:06

source share

9 answers

python2

 >>> import urlparse >>> urlparse.urljoin('/media/path/', 'js/foo.js') '/media/path/js/foo.js'

But be careful

 >>> import urlparse >>> urlparse.urljoin('/media/path', 'js/foo.js') '/media/js/foo.js'

, as well as

 >>> import urlparse >>> urlparse.urljoin('/media/path', '/js/foo.js') '/js/foo.js'

python3

 >>> import urllib.parse >>> urllib.parse.urljoin('/media/path/', 'js/foo.js') '/media/path/js/foo.js'

The reason you get different results from /js/foo.js and js/foo.js is because the first starts with a slash, which means it already starts with the root of the website.

+134

Ben James Nov 24 '09 at 10:10

source share

As you say, os.path.join joins paths based on current os. posixpath is the base module that is used on os.path systems in the os.path namespace:

 >>> os.path.join is posixpath.join True >>> posixpath.join('/media/', 'js/foo.js') '/media/js/foo.js'

This way you can simply import and use posixpath.join instead for URLs that are accessible and will work on any platform.

Edit: @Pete's suggestion is good, you can use imports to increase readability

 from posixpath import join as urljoin

Edit: I think this has become clearer, or at least helped me understand if you look at the source of os.py (the code here is from Python 2.7.11, plus I cut a few bits). There's a conditional import in os.py that selects which path module to use in the os.path namespace. All basic modules ( posixpath , ntpath , os2emxpath , riscospath ) that can be imported into os.py , aliases like path , exist and exist for use on all systems. os.py simply selects one of the modules to use in the os.path namespace at runtime based on the current OS.

 # os.py import sys, errno _names = sys.builtin_module_names if 'posix' in _names: # ... from posix import * # ... import posixpath as path # ... elif 'nt' in _names: # ... from nt import * # ... import ntpath as path # ... elif 'os2' in _names: # ... from os2 import * # ... if sys.version.find('EMX GCC') == -1: import ntpath as path else: import os2emxpath as path from _emx_link import link # ... elif 'ce' in _names: # ... from ce import * # ... # We can use the standard Windows path. import ntpath as path elif 'riscos' in _names: # ... from riscos import * # ... import riscospath as path # ... else: raise ImportError, 'no os specific module found'

+44

GP89 Mar 07 '13 at 19:15

source share

It does the job beautifully:

 def urljoin(*args): """ Joins given arguments into an url. Trailing but not leading slashes are stripped for each argument. """ return "/".join(map(lambda x: str(x).rstrip('/'), args))

+26

Rune Kaagaard Jul 04 '12 at 9:28

source share

The basejoin function in the urllib package may be what you are looking for.

 basejoin = urljoin(base, url, allow_fragments=True) Join a base URL and a possibly relative URL to form an absolute interpretation of the latter.

Edit: I haven’t noticed before, but urllib.basejoin seems to be directly visible in urlparse.urljoin, which makes the latter preferable.

+9

mwcz Nov 24 '09 at 10:10

source share

Using furl, pip install furl , it will look like this:

  furl.furl('/media/path/').add(path='js/foo.js')

+7

Vasili Pascal 04 Oct '17 at 13:39 on

source share

I know this is a little more than the OP requested. However, I had parts for the following URL and I was looking for an easy way to join them:

 >>> url = 'https://api.foo.com/orders/bartag?spamStatus=awaiting_spam&page=1&pageSize=250'

Performing some looks around:

 >>> split = urlparse.urlsplit(url) >>> split SplitResult(scheme='https', netloc='api.foo.com', path='/orders/bartag', query='spamStatus=awaiting_spam&page=1&pageSize=250', fragment='') >>> type(split) <class 'urlparse.SplitResult'> >>> dir(split) ['__add__', '__class__', '__contains__', '__delattr__', '__dict__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__getslice__', '__getstate__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__module__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', '__weakref__', '_asdict', '_fields', '_make', '_replace', 'count', 'fragment', 'geturl', 'hostname', 'index', 'netloc', 'password', 'path', 'port', 'query', 'scheme', 'username'] >>> split[0] 'https' >>> split = (split[:]) >>> type(split) <type 'tuple'>

Therefore, in addition to joining the paths that have already been answered in other answers, In order to get what I was looking for, I did the following:

 >>> split ('https', 'api.foo.com', '/orders/bartag', 'spamStatus=awaiting_spam&page=1&pageSize=250', '') >>> unsplit = urlparse.urlunsplit(split) >>> unsplit 'https://api.foo.com/orders/bartag?spamStatus=awaiting_spam&page=1&pageSize=250'

EXACTLY a 5-fold tuple is required in accordance with the documentation .

In the following tuple format:

scheme 0 URL scheme specifier empty string
netloc 1 Invalid string in network folder
path 2 hierarchical path empty string
query 3 Empty row of query component
fragment 4 identifier of the empty string fragment

+5

jmunsch Mar 22 '15 at 17:19

source share

To improve Alex Martelli's answer a bit, the following will not only clear the extra slashes, but also keep the slashes (endings), which can sometimes be useful:

 >>> items = ["http://www.website.com", "/api", "v2/"] >>> url = "/".join([(u.strip("/") if index + 1 < len(items) else u.lstrip("/")) for index, u in enumerate(items)]) >>> print(url) http://www.website.com/api/v2/

It is not so easy to read, and will not clear a few extra slashes.

+3

Florent Thiery Sep 22 '17 at 9:00

source share

Rune Kaagaard provided me with a great and compact solution that I worked, I expanded it a bit:

 def urljoin(*args): trailing_slash = '/' if args[-1].endswith('/') else '' return "/".join(map(lambda x: str(x).strip('/'), args)) + trailing_slash

This allows you to combine all arguments regardless of the end and end slashes, preserving the last slash, if present.

+2

futuere Apr 11 '19 at 20:51

source share

Alex Martelli · Accepted Answer · 2009-11-25 04:05

Since from the comments posted by the OP, it seems that it does not want to keep the absolute URLs in the connection (which is one of the key tasks of urlparse.urljoin ;-), I would recommend avoiding this. os.path.join will also be bad, for the same reason.

So, I would use something like '/'.join(s.strip('/') for s in pieces) (if the lead / should also be ignored - if the lead should be specially trimmed, this is also possible, - )

How to join path components when creating url in Python

More articles: