What to do if socket.setdefaulttimeout () function does not work?

I am writing a script (multithreading) to retrieve content from a website, and the site is not very stable, so from time to time an HTTP request appears that cannot be disabled using socket.setdefaulttimeout() . Since I have no control over this site, the only thing I can do is improve my codes, but now I'm running out of ideas.

Code examples:

 socket.setdefaulttimeout(150) MechBrowser = mechanize.Browser() Header = {'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 GTB7.1 (.NET CLR 3.5.30729)'} Url = "http://example.com" Data = "Justatest=whatever&letstry=doit" Request = urllib2.Request(Url, Data, Header) Response = MechBrowser.open(Request) Response.close() 

What should I do to make a hanging request go out? Actually I want to know why socket.setdefaulttimeout(150) does not work in the first place. Can anybody help me?

Added: (and yes the problem has not been resolved yet)

Ok, I followed the tomasz suggestion and changed the codes to MechBrowser.open(Request, timeout = 60) , but the same thing happens. I still received hanging requests all the time, sometimes for several hours, and sometimes for several days. What should I do now? Is there any way to make these hanging requests go away?

+7
source share
4 answers

While socket.setsocketimeout sets the default timeout for new sockets, if you do not use sockets directly, this option can be easily overwritten. In particular, if the library calls socket.setblocking on its socket, it will reset timeout.

urllib2.open has a timeout argument, hoverwer, in urllib2.Request there is no timeout. When you use mechanize , you should refer to their documentation:

Starting in Python 2.6, urllib2 uses the .timeout attribute for Request objects inside. However, urllib2.Request does not have a timeout constructor argument, and urllib2.urlopen () ignores this parameter. mechanize.Request has a timeout constructor argument that is used to set the attribute with the same name, and mechanize.urlopen () does not ignore the timeout attribute.

source: http://wwwsearch.sourceforge.net/mechanize/documentation.html

--- --- EDIT

If either socket.setsockettimeout or the downtime before mechanize works with small values, but not with higher values, the source of the problem may be completely different. One thing is that your library can open several connections (here is the credit for @ CΓ©dric Julien), so a timeout is applied to each socket.open attempt, and if it does not stop on the first failure, it can take up to timeout * num_of_conn seconds. Another thing: socket.recv : if the connection is really slow and you are out of luck, the whole request can take up to timeout * incoming_bytes , as in every socket.recv , we could get one byte, and each such call could take a timeout second . Since you are unlikely to suffer from this particular dark scenerio (one byte at a time in seconds? You must be a very rude boy), he will most likely ask for age for very slow connections and very high timeouts.

The only solution you have is to force a timeout for the whole request, but there is nothing to do with sockets. If you are using Unix, you can use a simple solution with an ALARM signal. You set the signal to increase in timeout seconds, and your request will be terminated (do not forget to catch it). You can use the with statement to make it clean and easy to use, for example:

 import signal, time def request(arg): """Your http request""" time.sleep(2) return arg class Timeout(): """Timeout class using ALARM signal""" class Timeout(Exception): pass def __init__(self, sec): self.sec = sec def __enter__(self): signal.signal(signal.SIGALRM, self.raise_timeout) signal.alarm(self.sec) def __exit__(self, *args): signal.alarm(0) # disable alarm def raise_timeout(self, *args): raise Timeout.Timeout() # Run block of code with timeouts try: with Timeout(3): print request("Request 1") with Timeout(1): print request("Request 2") except Timeout.Timeout: print "Timeout" # Prints "Request 1" and "Timeout" 

If you want to be more portable than this, you need to use several larger guns, such as multiprocessing , so you will call the process to invoke your request and terminate it if it is expired. Since this will be a separate process, you should use something to pass the result back to your application, this could be multiprocessing.Pipe . Here is an example:

 from multiprocessing import Process, Pipe import time def request(sleep, result): """Your http request example""" time.sleep(sleep) return result class TimeoutWrapper(): """Timeout wrapper using separate process""" def __init__(self, func, timeout): self.func = func self.timeout = timeout def __call__(self, *args, **kargs): """Run func with timeout""" def pmain(pipe, func, args, kargs): """Function to be called in separate process""" result = func(*args, **kargs) # call func with passed arguments pipe.send(result) # send result to pipe parent_pipe, child_pipe = Pipe() # Pipe for retrieving result of func p = Process(target=pmain, args=(child_pipe, self.func, args, kargs)) p.start() p.join(self.timeout) # wait for prcoess to end if p.is_alive(): p.terminate() # Timeout, kill return None # or raise exception if None is acceptable result else: return parent_pipe.recv() # OK, get result print TimeoutWrapper(request, 3)(1, "OK") # prints OK print TimeoutWrapper(request, 1)(2, "Timeout") # prints None 

You really have no choice if you want to force a request to complete after a fixed number of seconds. socket.timeout will provide a timeout for working with a single socket (connect / recv / send), but if you have several, you may encounter very long runtimes.

+18
source

From their documentation:

Since Python 2.6, urllib2 uses the .timeout attribute for Request objects internally. However, urllib2.Request does not have a timeout constructor argument, and urllib2.urlopen () ignores this parameter. mechanize.Request has a timeout constructor argument, which is used to set the attribute with the same name, and mechanize.urlopen () does not ignore the timeout attribute.

Maybe you should try replacing urllib2.Request with mechanize.Request.

+2
source

You can try to use mechanize with the eventlet . It does not solve the timeout problem, but greenlet does not block, so it can solve your performance problem.

0
source

I suggest a simple workaround - move the request to another process and, if it cannot complete it, kill it from the calling process, this way:

  checker = Process(target=yourFunction, args=(some_queue)) timeout = 150 checker.start() counter = 0 while checker.is_alive() == True: time.sleep(1) counter += 1 if counter > timeout : print "Son process consumed too much run-time. Going to kill it!" kill(checker.pid) break 

simple, fast and efficient.

-one
source

All Articles