Understanding Asynchronous / Multiprocessing in Python

Lets say that I have a function:

from time import sleep def doSomethingThatTakesALongTime(number): print number sleep(10) 

and then I call it in a for loop

 for number in range(10): doSomethingThatTakesALongTime(number) 

How can I set it up so that it only takes 10 seconds TOTAL to print:

 $ 0123456789 

Instead of taking 100 seconds. If this helps, I am going to use the information you provide for asynchronous web scraping. that is, I have a list of sites that I want to visit, but I want to visit them at the same time, and not wait for each of them to complete.

+1
source share
4 answers

Take a look at the scope of treatment. It is designed specifically for web scraping and is very good. It is asynchronous and built on a twisted basis.

http://scrapy.org/

+1
source

Try using Eventlet - the first example documentation shows how to implement URL fetching at the same time:

 urls = ["http://www.google.com/intl/en_ALL/images/logo.gif", "https://wiki.secondlife.com/w/images/secondlife.jpg", "http://us.i1.yimg.com/us.yimg.com/i/ww/beta/y3.gif"] import eventlet from eventlet.green import urllib2 def fetch(url): return urllib2.urlopen(url).read() pool = eventlet.GreenPool() for body in pool.imap(fetch, urls): print "got body", len(body) 

I can also recommend looking at Celery for a more flexible solution.

+2
source

asyncoro supports asynchronous concurrent programming. It includes an asynchronous (non-blocking) socket implementation. If your implementation does not need urllib / httplib, etc. (Which do not have asynchronous improvements), it can meet your purpose (and is easy to use, since it is very similar to programming with threads). Your problem above with asinkoro:

 import asyncoro def do_something(number, coro=None): print number yield coro.sleep(10) for number in range(10): asyncoro.Coro(do_something, number) 
+2
source

Just in case, this is the exact way to apply green threads to your example snippet:

 from eventlet.green.time import sleep from eventlet.greenpool import GreenPool def doSomethingThatTakesALongTime(number): print number sleep(10) pool = GreenPool() for number in range(100): pool.spawn_n(doSomethingThatTakesALongTime, number) import timeit print timeit.timeit("pool.waitall()", "from __main__ import pool") # yields : 10.9335260363 
0
source

All Articles