How to upload a file using python smarter?

I need to upload multiple files via http in Python.

The most obvious way to do this is to simply use urllib2:

import urllib2 u = urllib2.urlopen('http://server.com/file.html') localFile = open('file.html', 'w') localFile.write(u.read()) localFile.close() 

But I have to deal with URLs that are disgusting, for example: http://server.com/!Run.aspx/someoddtext/somemore?id=121&m=pdf . When downloading via a browser, the file has a user-friendly name, that is. accounts.pdf .

Is there a way to handle this in python, so I don't need to know the file names and hardcode them into my script?

+66
python download
May 14 '09 at 8:21
source share
5 answers

Download such scripts, as a rule, click on the header telling the user agent what to name the file:

 Content-Disposition: attachment; filename="the filename.ext" 

If you can capture this header, you can get the correct file name.

There's another thread that has some code for the Content-Disposition -grabbing clause.

 remotefile = urllib2.urlopen('http://example.com/somefile.zip') remotefile.info()['Content-Disposition'] 
+40
May 14 '09 at 8:28 a.m.
source share

Based on the comments and @Oli anwser, I made the following decision:

 from os.path import basename from urlparse import urlsplit def url2name(url): return basename(urlsplit(url)[2]) def download(url, localFileName = None): localName = url2name(url) req = urllib2.Request(url) r = urllib2.urlopen(req) if r.info().has_key('Content-Disposition'): # If the response has Content-Disposition, we take file name from it localName = r.info()['Content-Disposition'].split('filename=')[1] if localName[0] == '"' or localName[0] == "'": localName = localName[1:-1] elif r.url != url: # if we were redirected, the real file name we take from the final URL localName = url2name(r.url) if localFileName: # we can force to save the file as specified name localName = localFileName f = open(localName, 'wb') f.write(r.read()) f.close() 

Requires a file name from Content-Disposition; if it is absent, it uses the file name from the URL (if redirection occurs, the final URL is taken into account).

+35
May 14 '09 at
source share

Combining most of the above, here is a more Pythonic solution:

 import urllib2 import shutil import urlparse import os def download(url, fileName=None): def getFileName(url,openUrl): if 'Content-Disposition' in openUrl.info(): # If the response has Content-Disposition, try to get filename from it cd = dict(map( lambda x: x.strip().split('=') if '=' in x else (x.strip(),''), openUrl.info()['Content-Disposition'].split(';'))) if 'filename' in cd: filename = cd['filename'].strip("\"'") if filename: return filename # if no filename was found above, parse it out of the final URL. return os.path.basename(urlparse.urlsplit(openUrl.url)[2]) r = urllib2.urlopen(urllib2.Request(url)) try: fileName = fileName or getFileName(url,r) with open(fileName, 'wb') as f: shutil.copyfileobj(r,f) finally: r.close() 
+23
Jan 14 '10 at 19:54
source share

2 Kender :

 if localName[0] == '"' or localName[0] == "'": localName = localName[1:-1] 

it is unsafe - the web server may pass an incorrect formatted name as ["file.ext] or [file.ext '] or even be empty, and localName [0] throws an exception. The correct code may look like this:

 localName = localName.replace('"', '').replace("'", "") if localName == '': localName = SOME_DEFAULT_FILE_NAME 
+1
Mar 23 '10 at 21:12
source share

Using wget :

 custom_file_name = "/custom/path/custom_name.ext" wget.download(url, custom_file_name) 

Using urlretrieve:

 urllib.urlretrieve(url, custom_file_name) 

urlretrieve also creates a directory structure if it does not exist.

0
Sep 19 '16 at 12:37
source share



All Articles