TcpListener based application that does not scale well

I have an ECHO server application based on TCPListener . It receives clients, reads data and returns the same data. I developed it using the async / wait approach using the XXXAsync methods provided by the framework.

I set up performance counters to measure the number and number of messages and bytes, as well as the number of connected sockets.

I created a test application that runs 1400 asynchronous TCPClient and sends a 1Kb message every 100-500 ms. Clients have a random start of waiting between 10-1000 ms at the beginning, so they do not try to connect everything at the same time. I work well, I see that 1400 is connected in PerfMonitor, sending messages at a good speed. I am running a client application from another computer. The server processor and memory usage is very small, it is an Intel Core i7 with 8 GB of RAM. The client seems more busy, this is i5 with 4 GB of RAM, but still not even 25%.

The problem is that I am running another client application. Connections are starting to fail at clients. I do not see a large number of messages per second (an increase of 20% more or less), but I see that the number of connected clients is about 1900-2100, and not expected to 2800. The performance decreases slightly, and the graph shows large changes between max messages and min per second than before.

However, CPU usage is not even 40%, and memory usage is still low. I tried to increase the number or pool of threads on both the client and server:

 ThreadPool.SetMaxThreads(5000, 5000); ThreadPool.SetMinThreads(2000, 2000); 

On the server, connections are accepted in a loop:

 while(true) { var client = await _server.AcceptTcpClientAsync(); HandleClientAsync(client); } 

The HandleClientAsync function returns a Task , but as you can see, the loop does not wait for processing, it just continues to accept another client. This processing function looks something like this:

 public async Task HandleClientAsync(TcpClient client) { while(ws.Connected && !_cancellation.IsCancellationRequested) { var msg = await ReadMessageAsync(client); await WriteMessageAsync(client, msg); } } 

These two functions only read and write to the stream asynchronously.

I saw that I can start TCPListener with the amount of backlog , but what is the default value?

Why could there be a reason why the application does not scale until it reaches the maximum processor?

What will be the approach and tools to figure out what the actual problem is?

UPDATE

I tried the Task.Yield and Task.Run approaches and they did not help.

This also happens with a server and client running locally on the same computer. Increasing the number of clients or messages per second actually reduces the throughput of the service. 600 clients sending a message every 100 ms generates more bandwidth than 1000 clients sending a message every 100 ms.

The exceptions that I see on the client when connecting more than 2000 clients are two. Since about 1500, I see exceptions at the beginning, but clients finally connect. With over 1500, I see a lot of connections / disconnections:

"Existing connection was forcibly closed by the remote host" (System.Net.Sockets.SocketException) A Fixed System.Net.Sockets.SocketException: "existing connection was forcibly closed by the remote host"

"Cannot write data to transport connection: existing connection was forcibly closed by the remote host." (System.IO.IOException) System.IO.IOException exception: "Failed to write data to the transport connection: the existing connection was forcibly closed by the remote host."

UPDATE 2

I created a simple simple project with server and client using async / wait , and it scales as expected.

In a project where I have a scalability problem, this is a WebSocket server , and even when it uses the same approach, something obviously causes competition. There is a console application in which the component is located , and a console application generates a load (although this requires at least Windows 8).

Please note that I am not asking for an answer to fix the problem directly, but for methods or approaches to find out what causes this statement.

+5
source share
2 answers

I managed to scale up to 6,000 simultaneous connections without problems and process about 24,000 messages per second, connecting from a machine without a machine (without the localhost test) and using only about 80 physical streams.

There are some lessons that I learned:

The increase in thread pool size has deteriorated significantly

Do not if you do not know what you are doing.

Call Task.Run or yield with Task.Yield

To free the calling thread from the rest of the method.

ConfigureAwait (false)

From an executable application, if you are sure that you are not in the same streaming synchronization context, this allows any thread to pick up a continuation, rather than waiting specifically for the one who started to free itself.

Byte []

The memory profiler showed that the application was spending too much memory and time creating Byte[] instances. Therefore, I developed several strategies for reusing available ones or just working "on the spot", rather than creating new ones and copying. GC performance counters (in particular, "% time in GC", which was about 55%) raised the alarm that something was wrong. In addition, I used BitArray instances to check for bits in bytes, which also caused some memory overhead, so I replaced them with bit-muddy operations and improved them. Later, I discovered that WCF uses the Byte[] pool to deal with this problem.

Asynchronous doesn't mean fast

Asynchronous allows you to scale beautifully, but it has a cost. Just because an asynchronous operation is available does not mean that you should use it. Use asynchronous programming when you expect it to take some time before getting the actual answer. If you are sure that there is data or the answer will be quick, continue synchronously.

Sync support and asynchronous operation are tedious

You must implement these methods twice, there is no bulletproof way to override async from synchronization code.

+5
source

Well, for example, you run everything on the same thread, so changing the ThreadPool doesn't make any difference.

EDIT . As Noseration noted, this is actually not the case. Although IOCP and the asynchronous socket itself do not actually require additional threads for I / O requests, the default implementation in .NET does. The completion event is processed in the ThreadPool thread, and you are responsible for delivering your own TaskScheduler or the queue for the event and manually processing it on the consumer thread. I will leave the rest of the answer because it still matters (and switching threads here is not a performance issue, as described below in the answer). Also note that by default TaskScheduler in the user interface application usually uses a synchronization context, for example, for example. winforms, the completion event will be processed in the user interface thread. In any case, throwing more threads than the processor core into the problem will not help.

However, this is not necessarily a bad thing. Operations with I / O binding do not benefit from running in a separate thread, in fact it is very inefficient. What async and IOCP are for, so keep using it.

If you are starting to get significant CPU utilization, then where you want to make something parallel, not just asynchronous. However, receiving messages on a single thread using await should be great. Multithreading processing is always complex, and there are many approaches for different situations. In practice, you usually don't want more threads than you have processor cores - if they are competing for I / O, use async . If they compete for the processor, it will only get worse with more threads than the processor can handle in parallel.

Please note that since you are running on the same thread, one of your processor cores can run 100% and the rest do nothing. This can easily be checked in the task manager.

Also note that the number of TCP connections that you can open at the same time is very limited. Each connection must have its own ports both on the client and on the server. The default values ​​for client Windows are somewhere in the line of 1000-4000 ports for this. This is not so much for the server (as well as for clients testing the load).

If you open and close connections, it gets even worse, because the TCP ports are guaranteed to be open for some time (up to four minutes after disconnecting). This is because opening a new TCP connection on the same port may mean that data for the old connection may appear on the new connection, which would be very, very bad.

Please add additional information. What do ReadMessageAsync and WriteMessageAsync do? Is it possible that performance impact is caused by GC? Have you tried profiling the processor and memory? Are you sure you are not really running out of network bandwidth with all of these TCP messages? Have you checked if you have TCP port depletion or packet loss scenarios?

UPDATE I wrote a test server and client, and they can run out of available TCP ports in a second, including all initializations, when using asynchronous sockets. I run this on localhost, so each client connection actually accepts two ports (one for the server, one for the client), so it is somewhat faster than when the client is on another machine. In any case, it is obvious that the problem in my case is the exhaustion of the TCP port.

0
source

All Articles