TwistedWeb on multi-core / multi-processor

Question

TwistedWeb on multi-core / multi-processor

What methods are used to use multiple processors / cores when starting a TwistedWeb server? Is there a recommended way to do this?

My twisted.web-based web service runs on Amazon EC2 instances, which often have multiple processor cores (8, 16) and the type of work that the service takes advantage of the extra computing power, so I would really like to use this.

I understand that before several instances of Twisted, you can use haproxy, squid, or a web server configured as a reverse proxy. In fact, we currently use this setting, with nginx acting as a reverse proxy to several upstream twisted.web services running on the same host, but each on a different port.

This works fine, but I'm really interested, this is a solution in which there is no “front-end” server, but all twist processes are somehow tied to the same socket and accept requests. Is it even possible ... or am I crazy? Linux operating system (CentOS).

Thank.

Anton.

+17

twisted multiple-instances twisted.web

Orange Juce Apr 09 '12 at 18:17

source share

3 answers

The recommended way IMO should use haproxy (or another load balancer), as you already know, the load balancer should not be a bottleneck if it is configured correctly. In addition, you will need some fallover method that haproxy provides in case one of your processes goes down.

It is not possible to bind multiple processes to the same TCP socket, but this is possible using UDP.

+3

Ian Wetherbee Apr 09 '12 at 18:35

source share

If you also want to serve your web content via HTTPS, this is what you need to do in addition to the @ Jean-Paul snippet.

 from twisted.internet.ssl import PrivateCertificate from twisted.protocols.tls import TLSMemoryBIOFactory ''' Original snippet goes here .......... ............... ''' privateCert = PrivateCertificate.loadPEM(open('./server.cer').read() + open('./server.key').read()) tlsFactory = TLSMemoryBIOFactory(privateCert.options(), False, factory) reactor.adoptStreamPort(fd, AF_INET, tlsFactory)

Using fd , you will be serving HTTP or HTTPS, but not both. If you want to have both listenSSL in the parent process and enable ssl fd , you get from the ssl port as the second argument when the child process is born.

Final knife:

 from os import environ from sys import argv, executable from socket import AF_INET from twisted.internet import reactor from twisted.web.server import Site from twisted.web.static import File from twisted.internet import reactor, ssl from twisted.internet.ssl import PrivateCertificate from twisted.protocols.tls import TLSMemoryBIOFactory def main(fd=None, fd_ssl=None): root = File("/var/www") factory = Site(root) spawned = [] if fd is None: # Create a new listening port and several other processes to help out. port = reactor.listenTCP(8080, factory) port_ssl = reactor.listenSSL(8443, factory, ssl.DefaultOpenSSLContextFactory('./server.key', './server.cer')) for i in range(3): child = reactor.spawnProcess( None, executable, [executable, __file__, str(port.fileno()), str(port_ssl.fileno())], childFDs={0: 0, 1: 1, 2: 2, port.fileno(): port.fileno(), port_ssl.fileno(): port_ssl.fileno()}, env=environ) spawned.append(child) else: # Another process created the port, just start listening on it. port = reactor.adoptStreamPort(fd, AF_INET, factory) cer = open('./server.cer') key = open('./server.key') pem_data = cer.read() + key.read() cer.close() pem.close() privateCert = PrivateCertificate.loadPEM(pem_data ) tlsFactory = TLSMemoryBIOFactory(privateCert.options(), False, factory) reactor.adoptStreamPort(fd_ssl, AF_INET, tlsFactory) reactor.run() for p in spawned: p.signalProcess('INT') if __name__ == '__main__': if len(argv) == 1: main() else: main(int(argv[1:]))

+1

JSCW Jan 21 '14 at 9:56 on

source share

Jean-Paul Calderone · Accepted Answer · 2012-04-10 12:13

There are several ways to support multiprocessing for a Twisted application. However, one of the important questions to answer is what you expect from your concurrency model and how your application works with general state.

In one Twisted application process, concurrency is all cooperative (using Twisted asynchronous I / O APIs), and the general state can be stored anywhere in the Python object. Your application code works, knowing that until it gives up control, nothing else will be launched. Also, any part of your application that wants to access any part of the general state can probably do this quite easily, as this state is probably stored in a boring old Python object that is easy to access.

If you have several processes, even if all running Twisted-based applications, then you have two forms of concurrency. One of them is similar to the previous case - within the framework of a specific process, concurrency is cooperative. However, you have a new view in which several processes work. A platform scheduling planner can switch execution between these processes at any time, and you have very little control over this (and also very little visibility of when this will happen). It can even plan two of your processes to run on different cores at the same time (perhaps this is even what you are hoping for). This means that you lose some guarantees regarding consistency, because one process does not know when the second process can come and try to work in some general state. This leads to another important area of consideration, how do you actually share state between processes.

Unlike a single process model, you no longer have convenient, easily accessible places to store your state, where all your code can achieve this. If you put it in one process, all the code in that process can easily access it like regular Python objects, but any code running in any of your other processes no longer has easy access to it. You may need to find an RPC system so that your processes interact with each other. Or you can archive your process so that each process receives requests that require maintaining state in that process. An example of this would be a session website where all of the state of a user is stored in their session and their sessions are identified using cookies. The front-end process can receive web requests, check cookies, look for which server process is responsible for this session, and then redirect the request to this internal process. This scheme means that the back-end usually does not need to be connected (as long as your web application is quite simple - that is, until users interact with each other or work with shared data).

Note that in this example, the precoding model is not suitable. The front process must have an exclusive listening port so that it can check all incoming requests before they are processed by the internal process.

Of course, there are many types of applications that use many other state management models. Choosing the right model for multiprocessing requires first understanding what type of concurrency makes sense for your application and how you can manage your application state.

Given that with very new versions of Twisted (unreleased at the moment), it is quite easy to split the listening TCP port between several processes. Here is a snippet of code that demonstrates one way that you can use some new APIs for this:

from os import environ from sys import argv, executable from socket import AF_INET from twisted.internet import reactor from twisted.web.server import Site from twisted.web.static import File def main(fd=None): root = File("/var/www") factory = Site(root) if fd is None: # Create a new listening port and several other processes to help out. port = reactor.listenTCP(8080, factory) for i in range(3): reactor.spawnProcess( None, executable, [executable, __file__, str(port.fileno())], childFDs={0: 0, 1: 1, 2: 2, port.fileno(): port.fileno()}, env=environ) else: # Another process created the port, just start listening on it. port = reactor.adoptStreamPort(fd, AF_INET, factory) reactor.run() if __name__ == '__main__': if len(argv) == 1: main() else: main(int(argv[1]))

With older versions, you can sometimes get away with fork to share the port. However, it is more likely a bug prone, crashes on some platforms, and an unsupported way to use Twisted:

 from os import fork from twisted.internet import reactor from twisted.web.server import Site from twisted.web.static import File def main(): root = File("/var/www") factory = Site(root) # Create a new listening port port = reactor.listenTCP(8080, factory) # Create a few more processes to also service that port for i in range(3): if fork() == 0: # Proceed immediately onward in the children. # The parent will continue the for loop. break reactor.run() if __name__ == '__main__': main()

This works because of the normal behavior of the fork, where the newly created process (child) inherits all the memory and file descriptors from the original process (parent). Because processes are otherwise isolated, the two processes do not interfere with each other, at least until the Python code that they execute is executed. Because file descriptors are inherited, the parent or any of the children can accept connections on the port.

Since redirecting HTTP requests is such a simple task, I doubt that you will notice much of the performance improvement using any of these methods. The first is slightly better than proxying, because it simplifies deployment and simplifies work with applications other than HTTP. The latter is probably more of a commitment than it deserves.

TwistedWeb on multi-core / multi-processor

More articles: