Low Dependent Networking and Silver-Bullets

After some basic searching in low latency networks, I came to the following list of what programmers and system developers should consider when starting low latency networks:

  • Hardware, systems, and protocol design should be considered together

  • Developing protocols using UDP instead of TCP and implementing simple ack-nak, application-level repetitive logic

  • Reduce the number of context switches (preferably to zero) for a process or thread that consumes and packetizes data from a wire

  • Use the best selector for the OS (select, kqueue, epoll, etc.)

  • Use high-quality network cards and switches with plenty of built-in buffer (fifo)

  • Use multiple network adapters, especially for streaming and streaming data streams.

  • Reduce the number of IRQs generated by other devices or software (remove them briefly if they are not needed)

  • Reduce the use of mutexes and conditions. Instead, where possible, use Lock-Free programming methods. Take advantage of the CAS architecture. (Containers without blocking)

  • Consider single-threaded over multi-threaded designs - context switches are very expensive.

  • Understand and properly use your architecture caching system (L1 / L2, RAM, etc.)

  • Prefer full control over memory management rather than delegation to garbage collectors

  • Use good quality cables, keep cables as short as possible, reduce the number of twists and curls

My question is: I was wondering what other things that SOers consider are important when you start building low latency networks.

Feel free to criticize any of the points above.

+4
source share
3 answers

Cable quality is usually a red herring. I would have thought more about connecting a network analyzer to find out if you have enough retransmissions to leave. If you get a lot, try isolating where they occur and replace the cable (s) that causes / causes the problem. If you do not receive errors that result in retransmissions, then the cable (practically) does not affect latency.

Large buffers on network adapters and (especially) switches alone do not reduce latency. In fact, to truly minimize latency, you usually want to use the smallest buffers you can, rather than larger ones. Data in the buffer instead of processing immediately increases the delay. Honestly, this is rarely worth the worry, but still. If you really want to minimize latency (and care less about bandwidth), you'd better use a hub than a switch (sort of hard to find, but certainly low latency, while network overflow is quite low).

Several network adapters can significantly increase bandwidth, but their effect on latency is generally quite minimal.

Edit: My main advice, however, would be to get a sense of scale. Reducing the network cable on the foot saves you about a nanosecond - in the same general manner as speeding up packet processing with a few instructions in assembly language.

Bottom line. Like any other optimization, to get very far, you need to measure where you get latency before you can do a lot to reduce it. In most cases, reducing the length of the wire (to use one example) will not make enough difference to notice, simply because it is quick to start. If something starts with 10 microseconds, you can’t do anything; speed it up to 10 microseconds, so if you don’t have so fast that 10 we are a significant percentage of your time, this is not worth attacking.

+8
source

Others:

1: use userland network stacks

2: service interrupts on the same socket as the transfer code (shared cache)

3: prefer fixed-length protocols, even if they are slightly larger in bytes (faster parsing)

4: ignore network byte convention and just use custom ordering

5: never allocate objects in routines and pool (especially in prefab languages)

6: try to prevent as many copies of bytes as possible (hard when sending TCP)

7: use crossover mode

8: hack the network stack for slow TCP startup

9: advertise a huge TCP window (but don't use it) so that there can be many inflight packets on the other side at a time

10: disable NIC coalescing, especially for sending (package on the application stack if you need)

11: opt for copper

I can keep going, but that should make people think.

One I disagree with:

1: network cables are rarely a problem unless they are not working (there is an exception to this with regard to the type of cable)

+6
source

This may be a little obvious, but this is the method I'm happy with, and it works with both UDP and TCP, so I will write about this:

1) Never queue a significant amount of outgoing data: in particular, try to avoid sorting your data structures in memory into sequential byte buffers until the last moment. Instead, when your sending select () s socket is ready to write, smooth the current state of the corresponding / dirty data structures at this time and send them immediately. Thus, the data will not “accumulate” on the sending side. (also remember to set your socket's SO_SNDBUF as small as possible to minimize the data queue inside the kernel)

2) You can do something similar on the receiving side, assuming that your data is somehow connected: instead of executing a loop (reading data, processing data, retrying), you can read all available data messages and just put them in the structure data with keys (for example, a hash table) until the socket has more data available for reading, and then (and only then) iterate over the data structure and data processing. The advantage of this is that if your receiving client needs to perform some non-trivial processing of the received data, then outdated incoming messages will be automatically / implicitly deleted (since replacing them overwrites them in the data structure with keys), and therefore incoming packets won Do not create A backup of the kernel's incoming message queue. (You could just let the kernel queue fill and drop packets, of course, but then your program finishes reading the “old” packets and discarding the “new” packets, which is usually not what you want). As an additional optimization, you could associate an I / O stream with a processed key data structure with a separate processing stream so that I / O will not be held back by processing.

+3
source

All Articles