Unix sockets are slower than tcp when connected to redis

I am developing a high-performance web server that should handle ~ 2k simultaneous connections and 40k QPS, reaching a time of <7ms, respectively.

What he does is request a Redis server (running on the same host) and return a response to the client. During testing, I noticed that the implementation using TCP STREAM_SOCKET works better than connecting to UNIX sockets. With ~ 1500 TCP connections, about 8 ms remains, and unix connectors up to 50.

The server is written in C, based on a constant Posix thread pool, I use a blocking connection with Redis. My OS is CentOS 6, tests were performed using Jmeter, wrk and ab. To communicate with redis, I use hiredis lib, which provides these two ways to connect to Redis.
As far as I know, a unix socket should be no less fast than TCP.

Does anyone know what might cause this behavior ?, I'm stuck. Thanks.

+7
c unix sockets server redis
source share
2 answers

Unix Domain Sockets are usually faster than loopback TCP sockets. Typically, Unix Domain Sockets have an average latency of 2 microseconds, while TCP sockets have 6 microseconds.

If I run redis-benchmark with default settings (without a pipeline), I see 160 thousand requests per second, mainly because a single-threaded redis server is limited by a TCP socket, 160k requests work with an average response time of 6 microseconds.

Using Unix Domain Sockets, Redos reaches 320K SET / GET requests per second.

But there is a limit that we, in Torusware, actually achieved with our Speedus product, a high-performance TCP socket implementation with an average delay of 200 nanoseconds (ping us on info@torusware.com to request Extreme Performance version). With an almost zero delay, we see that the redis-benchmark reaches about 500 thousand requests per second. Thus, we can say that the latency of the redis server averages about 2 microseconds per request.

If you want to answer ASAP, and your load is below the maximum performance of the repeated server, then it is better to avoid pipelining. However, if you want to be able to handle higher throughput, you can handle request pipelines. The answer may take a little longer, but you can handle more requests on some equipment.

Thus, in the previous scenario with a pipeline of 32 requests (buffering 32 requests before sending the actual request through the socket), you could process up to 1 million requests per second via the loopback interface. And in this case, the advantages of UDS are not so high, especially because the processing of such pipelining is a performance bottleneck. In fact, 1M requests with a pipeline of 32 is about 31 thousand. "Actual" requests per second, and we saw that the redis server is capable of processing 160 thousand requests per second.

Unix Domain Sockets process around 1.1M and 1.7M SET / GET requests per second, respectively. TCP loopback processes 1M and 1.5 SET / GET requests per second.

During conveyor processing, a narrow web moves from the transport protocol to the pipeline processing mode.

This is consistent with the information mentioned in redis-benchmark .

However, pipelining significantly increases the response time. Thus, without pipelining, 100% of operations usually work in less than 1 millisecond. When pipelining 32 requests a maximum response time of 4 milliseconds on a high-performance server and tens of milliseconds if the redis server is running on another computer or virtual machine.

So you need response time and maximum throughput.

+16
source share

Although this is an old question, I would like to add a supplement. Other answers speak of 500 thousand. Or even 1.7 M answers / s. This can be achieved with Redis, but the question was:

Client - #Network # → Web Server - #Something # → Redis

The web server functions as an HTML proxy for Redis, I assume.

This means that the number of requests is also limited by the number of requests that the web server can reach. There is often a forgotten limitation: if you have a 100-megabit connection, you have 100,000,000 bits per second at your disposal, but the default is 1518 bits per packet (including the required space after the packet). This means: 65k network packets. Assuming all of your responses are less than some of the data in such a packet and should not be resent due to CRC errors or lost packets.

In addition, if persistent connections are not used, you need to perform a TCP / IP handshake for each connection. This adds 3 packets for the request (2 receptions, one dispatch). Thus, in this unoptimized situation, you are left with 21k requests received to your web server. (or 210k for a "perfect" gigabit connection) - if your answer matches one packet of 175 bytes.

So:

  • Permanent connections require only a little memory, so turn it on. It can increase your productivity four times. (the best option)
  • Reduce the size of the response using gzip / deflate if you want them to get as few packets as possible. (Each lost packet is a possible answer)
  • Reduce the size of the response by removing unnecessary garbage, such as debugging data or long xml tags.
  • HTTPS connections will add a huge (in comparison here) overhead information
  • Add network cards and their connecting lines.
  • if the responses are always less than 175 bytes, use a dedicated network card for this service and reduce the size of the network frame to increase the number of packets sent every second.
  • do not let the server do other things (e.g. serve normal web pages)
  • ...
+1
source share

All Articles