Should I manually embed data size information in a TCP transmission?

Imagine that we are sending a rather long sentence (say, 1024,000 bytes) over TCP.

If you write me a sentence of 1024,000 bytes, you are actually using NetworkStream to write these bytes.

When I receive, should I know in advance the size of the offer you sent?

If not, how can I check when should I stop stream.read?

If so, should the program have objects that insert the size of the data into the data head? So, first I get 4 bytes to find out how much should I read?

Is there anything to automatically insert data size into a wrapper?

+7
c # tcp data-transfer
source share
9 answers

Neither .NET nor the TCP protocol has anything built in to determine the size of the message that needs to be pre-configured. TCP only indicates that all data will be transmitted to the receiving endpoint (or at least the best efforts will be used for this).

You are solely responsible for determining how the recipient knows how much data to read. The details of how you do this, as others have pointed out, depend on the nature of what you are transmitting: you could send the length first, as you mentioned, you can encode special sequences called terminators, you could use predefined pieces of data therefore, all messages are the same size, etc.

EDIT

It began as a comment, but there is more than is suitable for this limit.

Adding NULL to a stream simply means adding a character that has a binary value of 0 (not to be confused with the character 0 ). Depending on the encoding that you use for your transfer (for example, ASCII, UTF-8, UTF-16, etc.), which can translate to send one or more 0 bytes, but if you use the appropriate translation, you just you need to put something like \0 in your line. Here is an example:

 string textToSend = "This is a NULL Terminated text\0"; byte[] bufferToSend = Encoding.UTF8Encoding.GetBytes(textToSend); 

Of course, all of the above assumes that all other data that you send does not contain any other NULLs. This means that it is text, not arbitrary binary data (for example, the contents of a file). It is very important! Otherwise, you cannot use NULL as a message terminator, and you will have to come up with a different scheme.

+4
source share

Generally speaking, a header with a data size is better than a trailing character. The character interrupt method is sensitive to a denial of service attack. I can just send data to your service, and until I turn on the terminator, you need to continue processing (and possibly allocating memory) until you work.

Using a header that contains the total size, if the transfer is too large for you, you can ignore it or send an error back. If an attacker tries to send more data than stated in the header, you will notice a damaged header at the beginning of the next stream and ignore it.

+2
source share

When I receive, should I know in advance about the size of the proposal that you sent?

This may be useful (for things like displaying progress bars), but it is not necessary.

If not, how can I check when should I stop stream.read?

The content of your stream determines this. For example, many messages encode some information that tells you that the message is complete (for example, zero byte to represent the end of a line or </html> to represent the end of an HTML document).

+1
source share

There are two ways to do this, one of them is how you described - placing the size of the message in the header, and the other - putting some kind of completion marker in the stream. For example, if your message does not have embedded NUL , you can end it with NUL .

+1
source share

If you know or can easily find out the total length of the message, I would suggest passing it in advance. If this is impossible or very expensive to determine, you can use something similar to chunked transfer encoding in HTTP.

+1
source share

The main thing is that with TCP there is no correspondence between the number and size of the socket record on the transmission side with the number / size of sockets read on the receiver side.

If the data stream has some kind of structure, you will have to add some meta / wrapper data around the payload.

At any time when I had to solve this problem, I used some combination:

a) use a magic number to indicate the beginning or end of your messages. msg (or both)

b) use the checksum at the end of msg to verify that the contents are correct (I know that TCP performs error checking and retransmission, but the checksum is useful when the receiver takes a random start / end magic number / sequence in the stream)

c) use the length field after the initial magic number (if the transmitting side knows the data length before the transmission starts)

Hover, before we go, let's see which higher-level protocol libraries are implemented for the language / platform used. NetworkStream? it's that Windows API / MFC or something like that.

For example, recently I had to configure a client / server system. The client and server functionality is already written in python, so the simple use of python xmlrpclib / server made it easy to combine the two programs together - literally copy the example, and I was done in 30 minutes. If I myself encoded some kind of madhina protocol directly on tcp, it would be 5 days!

+1
source share

Since TCP is a reliable protocol, you can either structure your protocol to indicate the number of bytes arriving, or use some kind of terminator to indicate the end of the transfer. If you use UDP, which is not guaranteed to be reliable, it would be much more important to build a protocol that will withstand discarded bytes or indicate how many bytes are expected (and have a retransmission mechanism), since a packet containing termination may be lost. Maximum data transfer times and timeouts can also be useful, but only if you can determine a reasonable maximum.

0
source share

My answer will be no. Especially for large data sets. The reason is that sending size first adds latency to your system.

If you want to send the size first, you need to calculate the whole answer before sending it.

On the other hand, if you use a completion token, you can start sending the first bits of data as soon as they are ready, when calculating the following data.

0
source share

You can also explore the BinaryReader / BinaryWriter classes that can be wrapped around any stream, TCP or otherwise.

These functions, among other functions, read / write strings (in the encoding of your choice), also taking care to include the length of the string.

0
source share

All Articles