I am trying to figure out how to use proxies and multithreading.
This code works:
requester = urllib3.PoolManager(maxsize = 10, headers = self.headers) thread_pool = workerpool.WorkerPool() thread_pool.map(grab_wrapper, [item['link'] for item in products]) thread_pool.shutdown() thread_pool.wait()
Then in grab_wrapper
requested_page = requester.request('GET', url, assert_same_host = False, headers = self.headers)
Headers consist of: Accept, Accept-Charset, Accept-Encoding, Accept-Language and User-Agent
But this does not work in production, since it must transmit a proxy server, authorization is not required.
I tried different things (passing proxies for the request, in headers, etc.). The only thing that works is this:
requester = urllib3.proxy_from_url(self._PROXY_URL, maxsize = 7, headers = self.headers) thread_pool = workerpool.WorkerPool(size = 10) thread_pool.map(grab_wrapper, [item['link'] for item in products]) thread_pool.shutdown() thread_pool.wait()
Now that I have launched the program, it will make 10 requests (10 threads), and then ... will stop. No errors, no warnings. This is the only way to bypass the proxy server, but it seems that it cannot use proxy_from_url and WorkerPool together.
Any ideas on how to combine these two into working code? I would prefer not to rewrite it in patchwork, etc. Due to time limit
Hello