How to perform a download with a limited response time using python requests?

When uploading a large file using python, I want to set a time limit not only for the connection process, but also for downloading.

I am trying to use the following python code:

import requests r = requests.get('http://ipv4.download.thinkbroadband.com/1GB.zip', timeout = 0.5, prefetch = False) print r.headers['content-length'] print len(r.raw.read()) 

This does not work (download is not limited in time), as correctly indicated in the documents: https://requests.readthedocs.org/en/latest/user/quickstart/#timeouts

That would be great if it were possible:

 r.raw.read(timeout = 10) 

The question is how to set a time limit for downloading?

+7
python python-requests urllib3
Nov 26 '12 at 21:05
source share
3 answers

And the answer: do not use requests as it blocks. Use non-blocking network I / O, such as eventlet:

 import eventlet from eventlet.green import urllib2 from eventlet.timeout import Timeout url5 = 'http://ipv4.download.thinkbroadband.com/5MB.zip' url10 = 'http://ipv4.download.thinkbroadband.com/10MB.zip' urls = [url5, url5, url10, url10, url10, url5, url5] def fetch(url): response = bytearray() with Timeout(60, False): response = urllib2.urlopen(url).read() return url, len(response) pool = eventlet.GreenPool() for url, length in pool.imap(fetch, urls): if (not length): print "%s: timeout!" % (url) else: print "%s: %s" % (url, length) 

Produces expected results:

 http://ipv4.download.thinkbroadband.com/5MB.zip: 5242880 http://ipv4.download.thinkbroadband.com/5MB.zip: 5242880 http://ipv4.download.thinkbroadband.com/10MB.zip: timeout! http://ipv4.download.thinkbroadband.com/10MB.zip: timeout! http://ipv4.download.thinkbroadband.com/10MB.zip: timeout! http://ipv4.download.thinkbroadband.com/5MB.zip: 5242880 http://ipv4.download.thinkbroadband.com/5MB.zip: 5242880 
+7
Nov 27
source share

Using the " prefetch=False " Requests "parameter you get the opportunity to take out arbitrary sizes of processing at a time (and not all at once).

What you need to do is say that Requests should not preload the entire request and save your time, how much you have spent reading so far, while getting small pieces. You can get the piece using r.raw.read(CHUNK_SIZE) . In general, the code will look something like this:

 import requests import time CHUNK_SIZE = 2**12 # Bytes TIME_EXPIRE = time.time() + 5 # Seconds r = requests.get('http://ipv4.download.thinkbroadband.com/1GB.zip', prefetch=False) data = '' buffer = r.raw.read(CHUNK_SIZE) while buffer: data += buffer buffer = r.raw.read(CHUNK_SIZE) if TIME_EXPIRE < time.time(): # Quit after 5 seconds. data += buffer break r.raw.release_conn() print "Read %s bytes out of %s expected." % (len(data), r.headers['content-length']) 

Please note that sometimes it can be a bit longer than 5 seconds, since the last r.raw.read(...) can lag behind an arbitrary amount of time. But at least it doesn't depend on multithreaded or socket timeouts.

+2
Nov 27
source share

Start the download in a thread, which you can then interrupt if you don’t finish on time.

 import requests import threading URL='http://ipv4.download.thinkbroadband.com/1GB.zip' TIMEOUT=0.5 def download(return_value): return_value.append(requests.get(URL)) return_value = [] download_thread = threading.Thread(target=download, args=(return_value,)) download_thread.start() download_thread.join(TIMEOUT) if download_thread.is_alive(): print 'The download was not finished on time...' else: print return_value[0].headers['content-length'] 
-3
Nov 26 '12 at 21:14
source share



All Articles