While socket.setsocketimeout
sets the default timeout for new sockets, if you do not use sockets directly, this option can be easily overwritten. In particular, if the library calls socket.setblocking
on its socket, it will reset timeout.
urllib2.open
has a timeout argument, hoverwer, in urllib2.Request
there is no timeout. When you use mechanize
, you should refer to their documentation:
Starting in Python 2.6, urllib2 uses the .timeout attribute for Request objects inside. However, urllib2.Request does not have a timeout constructor argument, and urllib2.urlopen () ignores this parameter. mechanize.Request has a timeout constructor argument that is used to set the attribute with the same name, and mechanize.urlopen () does not ignore the timeout attribute.
source: http://wwwsearch.sourceforge.net/mechanize/documentation.html
--- --- EDIT
If either socket.setsockettimeout
or the downtime before mechanize
works with small values, but not with higher values, the source of the problem may be completely different. One thing is that your library can open several connections (here is the credit for @ CΓ©dric Julien), so a timeout is applied to each socket.open attempt, and if it does not stop on the first failure, it can take up to timeout * num_of_conn
seconds. Another thing: socket.recv
: if the connection is really slow and you are out of luck, the whole request can take up to timeout * incoming_bytes
, as in every socket.recv
, we could get one byte, and each such call could take a timeout
second . Since you are unlikely to suffer from this particular dark scenerio (one byte at a time in seconds? You must be a very rude boy), he will most likely ask for age for very slow connections and very high timeouts.
The only solution you have is to force a timeout for the whole request, but there is nothing to do with sockets. If you are using Unix, you can use a simple solution with an ALARM
signal. You set the signal to increase in timeout
seconds, and your request will be terminated (do not forget to catch it). You can use the with
statement to make it clean and easy to use, for example:
import signal, time def request(arg): """Your http request""" time.sleep(2) return arg class Timeout(): """Timeout class using ALARM signal""" class Timeout(Exception): pass def __init__(self, sec): self.sec = sec def __enter__(self): signal.signal(signal.SIGALRM, self.raise_timeout) signal.alarm(self.sec) def __exit__(self, *args): signal.alarm(0)
If you want to be more portable than this, you need to use several larger guns, such as multiprocessing
, so you will call the process to invoke your request and terminate it if it is expired. Since this will be a separate process, you should use something to pass the result back to your application, this could be multiprocessing.Pipe
. Here is an example:
from multiprocessing import Process, Pipe import time def request(sleep, result): """Your http request example""" time.sleep(sleep) return result class TimeoutWrapper(): """Timeout wrapper using separate process""" def __init__(self, func, timeout): self.func = func self.timeout = timeout def __call__(self, *args, **kargs): """Run func with timeout""" def pmain(pipe, func, args, kargs): """Function to be called in separate process""" result = func(*args, **kargs)
You really have no choice if you want to force a request to complete after a fixed number of seconds. socket.timeout
will provide a timeout for working with a single socket (connect / recv / send), but if you have several, you may encounter very long runtimes.