A grequests pool with multiple request.session?

I want to do a lot of url details for a REST website. Usually between 75-90k. However, I need to limit the number of simultaneous connections to the web service.

I started playing with grequests as follows, but quickly started chewing open sockets.

concurrent_limit = 30 urllist = buildUrls() hdrs = {'Host' : 'hostserver'} g_requests = (grequests.get(url, headers=hdrs) for url in urls) g_responses = grequests.map(g_requests, size=concurrent_limit) 

How it works for a minute or so, I get with errors "maximum number of sockets". As far as I can tell, each of the request.get requests in grequests uses its own session, which means that a new socket is opened for each request.

I found a note on github referring to how to get grequests to use a single session. But this, apparently, effectively eliminates all requests in one common pool. This seems to have exceeded the goal of asynchronous HTTP requests.

 s = requests.session() rs = [grequests.get(url, session=s) for url in urls] grequests.map(rs) 

Is it possible to use grequests or gevent.Pool in such a way as to create multiple sessions?

Put another way: how can I make many simultaneous HTTP requests using either the queues or the connection pool?

+7
python sockets gevent grequests
source share
3 answers

As a result, I did not use grequests to solve my problem. I still hope this is possible.

I used threading:

 class MyAwesomeThread(Thread): """ Threading wrapper to handle counting and processing of tasks """ def __init__(self, session, q): self.q = q self.count = 0 self.session = session self.response = None Thread.__init__(self) def run(self): """TASK RUN BY THREADING""" while True: url, host = self.q.get() httpHeaders = {'Host' : host} self.response = session.get(url, headers=httpHeaders) # handle response here self.count+= 1 self.q.task_done() return q=Queue() threads = [] for i in range(CONCURRENT): session = requests.session() t=MyAwesomeThread(session,q) t.daemon=True # allows us to send an interrupt threads.append(t) ## build urls and add them to the Queue for url in buildurls(): q.put_nowait((url,host)) ## start the threads for t in threads: t.start() 
+6
source share

Something like that:

 NUM_SESSIONS = 50 sessions = [requests.Session() for i in range(NUM_SESSIONS)] reqs = [] i = 0 for url in urls: reqs.append(grequests.get(url, session=sessions[i % NUM_SESSIONS] i+=1 responses = grequests.map(reqs, size=NUM_SESSIONS*5) 

This should distribute requests to more than 50 different sessions.

+3
source share

rs - AsyncRequest list. Each AsyncRequest has its own session.

 rs = [grequests.get(url) for url in urls] grequests.map(rs) for ar in rs: print(ar.session.cookies) 
+2
source share

All Articles