I have a big task of scrambling - most of the time the script is spent blocking due to the large amount of network latency. I am trying to multithreadedly use a script, so I can execute several requests at the same time, but about 10% of my threads die with the following error
URLError: <urlopen error [Errno -2] Name or service not known>
Another 90% completed successfully. I am requesting multiple pages from the same domain, so it seems that there may be a problem with DNS. I make 25 requests at a time (25 threads). Everything works fine if I limit myself to 5 requests at a time, but as soon as I get about 10 requests, I sometimes see this error.
I read Repeated attempts to find hosts in urllib2
which describes the same problem as mine, and followed its suggestions, but to no avail.
I also tried to use a multiprocessor module instead of multithreading, I get the same behavior - about 10% of processes die with the same error, which makes me think that this is not a problem with urllib2, but something else.
Can someone explain what is happening and suggest how to fix it?
UPDATE
If I manually encode the site ip address in my script, everything works fine, so this error happens sometime during a DNS lookup.
source
share