Python: deferToThread XMLRPC Server - Twisted - Cherrypy?

This question is related to the others that I asked for here, mainly regarding sorting huge datasets in memory.

This is basically what I want / have:

A running XMLRPC server is running. This server stores several (32) instances of the Foo class in memory. Each Foo class contains a list pane (which will contain several million entries). There is a service that retrieves data from the database and transfers it to the XMLRPC server. Data is basically a dictionary with keys corresponding to each Foo instance, and values ​​is a list of dictionaries, for example:

data = {'foo1':[{'k1':'v1', 'k2':'v2'}, {'k1':'v1', 'k2':'v2'}], 'foo2':...} 

Each Foo instance then passes the value corresponding to its key, and the Foo.bar dictionaries are updated and sorted.

 class XMLRPCController(xmlrpc.XMLRPC): def __init__(self): ... self.foos = {'foo1':Foo(), 'foo2':Foo(), 'foo3':Foo()} ... def update(self, data): for k, v in data: threads.deferToThread(self.foos[k].processData, v) def getData(self, fookey): # return first 10 records of specified Foo.bar return self.foos[fookey].bar[0:10] class Foo(): def __init__(self): bar = [] def processData(self, new_bar_data): for record in new_bar_data: # do processing, and add record, then sort # BUNCH OF PROCESSING CODE self.bar.sort(reverse=True) 

The problem is that when the update function is called in XMLRPCController with a large number of records (for example, 100K +), it stops answering my getData calls until all 32 Foo instances have completed the process_data method. I thought deferToThread would work, but I think I don’t understand where the problem is.

Any suggestions ... I am open to using something else, such as Cherrypy, if it supports this required behavior.


EDIT

@Troy: this is how the reactor works

 reactor.listenTCP(port_no, server.Site(XMLRPCController) reactor.run() 

As for the GIL, it would be a viable option to change the sys.setcheckinterval () value to something less, so the data lock is released so that it can be read?

-1
source share
2 answers

The easiest way to get the application to be responsive is to break up the processor intensive processing in small pieces, allowing the twisted reactor to work between them. For example, by calling the reactor .callLater (0, process_next_chunk) to advance to the next fragment. Effectively implement collaborative multitasking on your own.

Another way would be to use separate processes to do the work, then you will benefit from several cores. Take a look at Ampoule: https://launchpad.net/ampoule It provides an API similar to deferToThread.

+1
source

I don’t know how long the processData method works, and how you set up your twisted reactor. By default , a twisted reactor has a thread pool of 0 to 10 threads. Perhaps you are trying to defer up to 32 lengthy calculations to as many as 10 threads. This is not optimal.

You also need to ask what role GIL plays in updating all of these collections.

Edit: Before making any major changes to your program (for example, calling sys.setcheckinterval() ), you must run it using the profiler or the python trace module. They should tell you what methods you use all your time. Without the right information, you cannot make the right changes.

0
source

All Articles