What is the most efficient way to handle a large number of file descriptors?

Question

What is the most efficient way to handle a large number of file descriptors?

There seem to be several options available for programs that handle a large number of socket connections (such as web services, p2p systems, etc.).

Create a separate stream for I / O processing for each socket.
Use the select system call to multiplex I / O into a single stream.
Use the poll system call to multiplex I / O (replace selection).
Use the epoll system calls to avoid re-sending fd sockets across user / system boundaries.
Create multiple I / O streams, each of which multiplexes a relatively small set of total connections using the polling API.
According to No. 5, except for using the epoll API to create a separate epoll object for each individual I / O stream.

On a multi-core processor, I would expect that # 5 or # 6 would have better performance, but I don't have hard data supporting this. A search on the Internet has raised this page describing the experience of testing author approaches No. 2, No. 3, and No. 4 above. Unfortunately, this webpage seems to be about 7 years old, without any obvious recent updates.

So my question is: which of these approaches has shown that people are most effective and / or there is another approach that works better than any of the above? Links to real charts, white papers and / or online records will be appreciated.

+7

performance optimization linux sockets

Kevin S. Sep 27 '08 at 1:12

source share

4 answers

Perry lorier · Answer 1 · 2008-12-30T11:32:17+0000

Speaking with my experience with large IRC servers, we used the select () and poll () functions (because epoll () / kqueue () was not available). About 700 simultaneous clients, the server will use 100% of the CPU (the irc server was not multithreaded). However, interestingly, the server will work well. About 4,000 clients, the server will begin to lag.

The reason for this was that about 700 users, when we return to select (), one client will be available for processing. Scanning for () to figure out which client it will be will consume most of the processor. As we got more customers, we would start to get more and more customers who need to be processed in each select () call, so that we become more efficient.

Moving to epoll () / kqueue (), similar specialized machines will trivially deal with 10,000 clients, some (more powerful machines, but still machines that will be considered tiny by today's standards), spent 30,000 clients without breaking a sweat .

The experiments I saw with SIGIO seem to show that this works well for applications where latency is extremely important when there are only a few active clients that do very little individual work.

I would recommend using epoll () / kqueue () over select () / poll () in almost any situation. I have not experimented with sharing clients between threads. Honestly, I never found a service that needed to be better optimized for handling the client interface to justify experimenting with threads.

twk · Answer 2 · 2008-09-30T20:46:47+0000

In my experience, you will have the best perf with number 6.

I also recommend that you take a look at libevent to deal with abstracting some of these details. At least you can see some of their tests. .

Also, how many nests are you talking about? Your approach probably doesn't matter much until you start getting at least a few hundred sockets.

Pierre · Answer 3 · 2011-06-22T12:45:33+0000

I spent the last two years working on this specific issue (for the G-WAN web server that comes with MANY benchmarks and diagrams showing all this).

The model that works best on Linux is epoll with a single event queue (and for heavy processing, multiple workflows).

If you have little processing (low processing latency), then using a single thread will be faster using multiple threads.

The reason for this is that epoll does not scale on multi-core processors (using multiple parallel epoll queues for I / O connections in the same user mode application will simply slow down your server).

I did not look seriously at the epoll code in the kernel (I am only focused on user mode so far), but I assume that the epoll implementation in the kernel is corrupted by locks.

This is why using multiple threads quickly hits the wall.

It goes without saying that such a bad state should not continue if Linux wants to maintain its position as one of the best kernels.

Martin Del Vecchio · Answer 4 · 2008-09-29T16:44:35+0000

I use epoll () extensively and it works well. I regularly plug in thousands of sockets and test up to 131,072 sockets. And epoll () can always handle this.

I use several threads, each of which is polled in a subset of sockets. This complicates the code, but makes full use of multi-core processors.

What is the most efficient way to handle a large number of file descriptors?

More articles: