Non-blocking, non-competitive tasks in Python

I am working on an implementation of a very small library in Python, which should be non-blocking.

In some production code, at some point, the call to this library will be executed, and it should do its job, in the simplest form, it will be called, which should pass some information to the service.

This “passing information to the service” is a non-intensive task, perhaps sending some data to the HTTP service or something like that. It also does not have to be parallel or exchange information, however , it must end at some point, possibly with a timeout.

I have used the threading module before, and it seems the most suitable for use, but the application in which this library will be used is so large that I am worried about restricting threads.

In local testing, I was able to hit this limit by about ~ 2500 threads spawned.

There is a good possibility (given the size of the application) that I can easily overcome this limit. It also makes me tired of using the queue, given the implications for hosting tasks at high speed.

I also looked at gevent , but I could not see an example of being able to run something that would do some work and end without a connection. I have given examples in which .join() is called for a spawned Greenlet or array of greens.

I do not need to know the result of the work done! It just needs to disconnect and try to talk to the HTTP service and die with a reasonable timeout if that is not the case.

Am I misinterpreting guides / tutorials for gevent ? Is there any other way to spawn a completely non-blocking call method that cannot fall within the 2500 limit?

This is a simple Threading example that works as I expected:

 from threading import Thread class Synchronizer(Thread): def __init__(self, number): self.number = number Thread.__init__(self) def run(self): # Simulating some work import time time.sleep(5) print self.number for i in range(4000): # totally doesn't get past 2,500 sync = Synchronizer(i) sync.setDaemon(True) sync.start() print "spawned a thread, number %s" % i 

And here is what I tried with gevent, where it obviously blocks at the end to see what the workers did:

 def task(pid): """ Some non-deterministic task """ gevent.sleep(1) print('Task', pid, 'done') for i in range(100): gevent.spawn(task, i) 

EDIT: My problem arose because of my ignorance with gevent . Although the Thread code did indeed spawn threads, it also prevented script termination while it was doing some work.

gevent doesn't actually do this in the code above unless you add .join() . All I had to do to see the gevent code, to do some work with the generated greens, was to make it a lengthy process. This definitely fixes my problem, since the code that should generate the potions runs in a framework that in itself is a lengthy process.

+4
source share
3 answers

Nothing requires you to call join in gevent if you expect your main thread to be longer than any of your employees.

The only reason to call join is to make sure that the main thread lasts at least until all workers (so that the program does not end earlier).

+4
source

Why not create a subprocess with a connected channel or the like, and then instead of the called one just drop the data on the pipe and let the subprocess process it completely out of range.

0
source

As explained in Understanding Asynchronous / Multiprocessing in Python , the asyncoro framework supports asynchronous parallel processes. You can run tens or hundreds of thousands of simultaneous processes; for reference, starting 100,000 simple processes takes about 200 MB. If you want, you can mix threads in the rest of the system and coroutines with asyncro (provided that threads and coroutines do not share the variables, but use the coroutine interface functions to send messages, etc.).

0
source

All Articles