Restricting TCP Messages with a Submit Queue and Other Design Issues

This question is the result of two other questions that I asked in the last few days.
I am creating a new question because I think it is connected with the “next step” in my understanding of how to control the flow of my sending / receiving, which I have not received yet. Other related issues:
IOCP Documentation Interpretation Question - Buffer Owner Ambiguity
Non-Blocking TCP Buffer Issues

In conclusion, I use Windows I / O completion ports.
I have several threads that handle notifications from the completion port.
I believe that the question is platform independent and will have the same answer as doing the same on * nix, * BSD, Solaris.

So, I need to have my own flow control system. Fine
So I send send and send, a lot. How do you know when to start the dispatch order, since the recipient side is limited to X?

Take an example (closest to my question): FTP protocol.
I have two servers; One is on the 100 MB link, and the other is on the 10 MB link.
I order 100 MB one to send another (linked 10 MB) 1 GB file. It ends with an average transfer rate of 1.25 MB / s.
How did the sender (associated with 100 MB) know when to send, so the slower one will not be flooded? (In this case, the "to-be-sent" queue is the actual file on the hard disk).

Another way to ask a question:
Can I get a “keep your messages” notification from the far side? Is it built into TCP or the so-called "reliable network protocol", do I need to do this?

I could, of course, limit my messages to a fixed number of bytes, but that just doesn't sound right to me.

Again, I have a loop with many sendings to the remote server, and at some point, in this loop, I will need to determine whether I have to queue to send or whether I can pass it to the transport layer (TCP) .
How can I do it? What would you do? Of course, when I receive a completion notification from the IOCP that the transfer has been completed, I will issue other pending shipments that will be cleared.

Another design issue related to this:
Since I have to use custom buffers with a send queue, and these buffers are freed up for reuse (thus not using the delete keyword) when a sent notification is received, I will have to use mutual exclusion from this buffer pool.
Using a mutex slows down, so I thought; Why would not each thread have its own buffer pool, so accessing it, at least when you get the necessary buffers for the send operation, would not require mutex, because it belongs only to this thread.
The buffer pool is located at the local thread storage (TLS) level.
The absence of a mutual pool implies the absence of the need for blocking, implies faster operations, but also implies more memory used by the application, because even if one thread has already assigned 1000 buffers, the other one that sends right now will need 1000 buffers to send something , they will need to be distinguished by themselves.

One more problem:
Say I have buffers A, B, C in the "to-be-sent" queue.
Then I get a completion notification that tells me that the recipient received 10 out of 15 bytes. Should I re-send from the relative offset of the buffer, or will TCP process it for me, i.e. complete the send? And if I should, can I be sure that this buffer is next-to-be-sent in the queue, or maybe, for example, buffer B?

This is a long question, and I hope no one was hurt (:

I would really like someone to take the time to answer here. I promise that I will vote for him twice! (:
Thanks everyone!

+4
source share
3 answers

First: I would ask this as separate questions. You are likely to get the answers this way.

I talked about this in my blog: http://www.lenholgate.com , but after you already sent me an email to say that you are reading my blog, you know that ...

The problem with TCP flow control is that since you send asynchronous records, and each of them uses resources until they are complete (see here ). While waiting for a recording, there are various resource usage issues you need to be aware of, and using your data buffer is the least important of them; you will also use some non-paged pool, which is the final resource (although in Vista and later operating systems it is much more accessible), you will also block pages in memory for the time of recording and there is a limit on the total number of pages that the OS can block. Please note that both problems with using an incomprehensible pool and problems with blocking pages are not something that is well documented anywhere, but you will see that the error crashes with ENOBUFS will start after you hit them.

Because of these problems, it is unreasonable to have an uncontrolled number of pending recordings. If you send a large amount of data, and you do not have flow control at the application level, then you need to know that if you send data faster than it can be processed by the other end of the connection or faster than the connection speed, then you will start to use many, many of the above resources because your records take longer to complete due to TCP flow control problems and window problems. You don’t have a problem locking the socket code, as write calls are simply blocked when the TCP stack can no longer write due to flow control problems; with async writes that the recordings are completed and then expected. With blocking code, blocking deals with your flow control for you; with asynchronous records, you can continue the cycle and more and more data that everyone is waiting for the TCP stack to send ...

In any case, because of this, with asynchronous I / O on Windows, you should ALWAYS have some form of explicit flow control. Thus, you either add application-level flow control to your protocol using the ACK, perhaps so that you know when the data has reached the other side, and only allow you to issue a certain amount at any time or if you can’t add to the protocol level applications, you can manage things using your notes. The trick is to allow a certain number of outstanding replenishes for each connection and queue the data (or just not generate it) as soon as you reach your limit. Then, when each record ends, you can generate a new record ....

Your question about combining data buffers, IMHO, premature optimization on your part right now. Get to the point that your system is working correctly, and you have profiled your system and found that competition in your buffer pool is the most important hot spot and THEN addresses it. I found that stream buffer pools do not work as well as distributing distributions and frees over streams, usually not as balanced as you need to work. I already talked about this in my blog: http://www.lenholgate.com/blog/2010/05/performance-comparisons-for-recent-code-changes.html

Your question about partial completion of the record (you send 100 bytes, and the completion returns and says that you only sent 95) is actually not a problem actually IMHO. If you fall into this position and have more than one outstanding record, then you can do nothing, subsequent records may work well, and you will not have bytes from what you expect to send; BUT a) I have never seen this happen if you have not already encountered the resource problems that I examined in detail above, and b) you have nothing to do if you have already posted more entries in this connection, just to terminate the connection - please attention that this is why I always profile my network systems on the equipment on which they will work, and I prefer to set restrictions in the MY code to prevent OS resource limits (bad drivers on Pre Vista operating systems often have a blue screen if they m Gut "t get an undamaged pool, so you can bring a box, if you do not pay close attention to these details).

Separate the questions next time, please.

+2
source

Q1. Most of the APIs will give you the option to “write possibly” after the last time you write and write again (perhaps this will happen right away if you were unable to fill the bulk of the send buffer with the last send).

With the final port, it will do the same as the "new data" event. Think of the new data as “read Ok,” so there is also a “write OK” event. Names differ between APIs.

Q2. If switching kernel mode to receive a mutex to a piece of data is harmful to you, I recommend rethinking what you are doing. It takes a maximum of 3 microseconds, and a slice of the thread scheduler can be up to 60 milliseconds in windows.

This can damage in extreme cases. If you think you are programming extreme communications, try again, and I promise to tell you all about it.

+1
source

To solve his question of when he knew to slow down, you seem to lack understanding of TCP congestion mechanisms. A “slow start” is what you are talking about, but it’s not quite the way you put it. A slow start is something that starts slowly, and becomes faster, until the other end wants to go, the speed of the wire line, whatever.

As for the rest of your question, Paul's answer should be sufficient.

+1
source

Source: https://habr.com/ru/post/1312703/


All Articles