Multithreaded web requests in python - "Name or service unknown"

I have a big task of scrambling - most of the time the script is spent blocking due to the large amount of network latency. I am trying to multithreadedly use a script, so I can execute several requests at the same time, but about 10% of my threads die with the following error

URLError: <urlopen error [Errno -2] Name or service not known>

Another 90% completed successfully. I am requesting multiple pages from the same domain, so it seems that there may be a problem with DNS. I make 25 requests at a time (25 threads). Everything works fine if I limit myself to 5 requests at a time, but as soon as I get about 10 requests, I sometimes see this error.

I read Repeated attempts to find hosts in urllib2 which describes the same problem as mine, and followed its suggestions, but to no avail.

I also tried to use a multiprocessor module instead of multithreading, I get the same behavior - about 10% of processes die with the same error, which makes me think that this is not a problem with urllib2, but something else.

Can someone explain what is happening and suggest how to fix it?

UPDATE

If I manually encode the site ip address in my script, everything works fine, so this error happens sometime during a DNS lookup.

+5
source share
1 answer

. DNS , nscd. DNS, .

, , urllib2.urlopen, , . .

, , - .

+1

All Articles