Problem with BinaryReader.ReadChars ()

I came across what, in my opinion, is a problem with the BinaryReader.ReadChars () method. When I wrap a BinaryReader around a raw NetworkStream socket, sometimes I get a stream corruption when the stream that is being read goes out of sync. This stream contains messages in a binary serialization protocol.

I traced it to the next

  • This only happens when reading a Unicode string (encoded using Encoding.BigEndian)
  • This only happens when the corresponding line is split into two tcp packets (confirmed with wirehark)

I think the following happens (in the context of the example below)

  • BinaryReader.ReadChars () is called with a request to read 3 characters (the length of the lines is encoded before the line itself)
  • The first loop internally asks to read 6 bytes (3 remaining characters * 2 bytes / char) from the network stream
  • The network stream has only 3 bytes
  • 3 bytes of reading to the local buffer
  • Buffer passed to decoder
  • The decoder decodes 1 char and stores another byte in its own internal buffer
  • The second loop internally requests 4 bytes to be read! (2 remaining characters * 2 bytes / char)
  • The network stream has all 4 bytes available.
  • 4 bytes to local buffer
  • Buffer passed to decoder
  • The decoder decodes 2 char and stores the remaining 4 bytes inside
  • Full line decoding
  • The serialization code is trying to untie the next element and shout out due to thread damage.

    char[] buffer = new char[3]; int charIndex = 0; Decoder decoder = Encoding.BigEndianUnicode.GetDecoder(); // pretend 3 of the 6 bytes arrives in one packet byte[] b1 = new byte[] { 0, 83, 0 }; int charsRead = decoder.GetChars(b1, 0, 3, buffer, charIndex); charIndex += charsRead; // pretend the remaining 3 bytes plus a final byte, for something unrelated, // arrive next byte[] b2 = new byte[] { 71, 0, 114, 3 }; charsRead = decoder.GetChars(b2, 0, 4, buffer, charIndex); charIndex += charsRead; 

I think the root is a bug in .NET code that uses charsRemaining * bytes / char every loop to calculate the remaining bytes. Due to the extra byte hidden in the decoder, this calculation can be disabled alone, causing the extra byte to be consumed from the input stream.

This is the .NET framework code.

  while (charsRemaining>0) { // We really want to know what the minimum number of bytes per char // is for our encoding. Otherwise for UnicodeEncoding we'd have to // do ~1+log(n) reads to read n characters. numBytes = charsRemaining; if (m_2BytesPerChar) numBytes <<= 1; numBytes = m_stream.Read(m_charBytes, 0, numBytes); if (numBytes==0) { return (count - charsRemaining); } charsRead = m_decoder.GetChars(m_charBytes, 0, numBytes, buffer, index); charsRemaining -= charsRead; index+=charsRead; } 

I'm not quite sure if this is a mistake or just a misuse of the API. To get around this problem, I simply compute the necessary bytes by reading them, and then run the byte [] through the corresponding Encoding.GetString (). However, this will not work for UTF-8.

To be interested in hearing people's thoughts about this and that I am doing something wrong or not. And perhaps this will save the next person a few hours / days of tedious debugging.

EDIT: sent to connect Connect tracking element

+7
c # binaryreader
source share
4 answers

I reproduced the issue you mentioned, BinaryReader.ReadChars .

Although a developer should always consider lookahead when composing things like streams and decoders, this seems like a pretty significant mistake in BinaryReader , because this class is designed to read data structures made up of different data types. In this case, I agree that ReadChars should have been more conservative in what it read so as not to lose this byte.

There is nothing wrong with your ReadChars for using Decoder , because that is what ReadChars does behind the scenes.

Unicode is a simple case. If you are thinking about arbitrary encoding, there really is no general way to guarantee that the correct number of bytes is consumed when you transmit the character count, and not the number of bytes (think of different length characters and cases associated with incorrect input). For this reason, avoiding BinaryReader.ReadChars in favor of reading a certain number of bytes provides a more reliable general solution.

I would suggest paying attention to Microsoft using http://connect.microsoft.com/visualstudio .

+3
source share

Interesting; you can report it on "connect". As a stop space, you can also try wrapping with a BufferredStream , but I expect this to seal the crack (maybe this will happen, but less often).

Another approach, of course, is to pre-buffer the entire message (but not the entire stream); then read something like MemoryStream - if your network protocol has logical (and ideally long prefix and not too large) messages. Then, when it decodes all available data.

+1
source share

This reminds me of one of my own questions ( Reading from HttpResponseStream fails ), where I had a problem when reading from an HTTP response stream, which StreamReader might think prematurely hit the end of the stream, so my parsers unexpectedly popped up.

As Mark suggested for your problem, I first tried pre-buffering in MemoryStream , which works well, but means you may have to wait a long time if you have a large file to read (especially from the network / network) until you can to do something useful with her. In the end, I decided to create my own TextReader extension, which overrides the reading methods and defines them using the ReadBlock method (which makes the lock read, that is, until it can get exactly the number of characters you request)

Your problem is probably due to the fact that the Read methods do not guarantee the return of the number of characters you ask for, for example, if you look at the documentation for BinaryReader.Read ( http://msdn.microsoft.com/en-us/library/ ms143295.aspx ), you will see that it indicates:

Return value
Type: System .. ::. Int32
The number of characters read into the buffer. This may be less than the number of bytes requested if the number of bytes is not available or it may be zero if the end of the stream is reached.

Since BinaryReader does not have ReadBlock methods such as TextReader, all you can do is take your own approach to monitoring the situation yourself or Marc for pre-caching.

+1
source share

I am working with Unity3D / Mono atm, and the ReadChars method may contain more errors. I made a line as follows:

 mat.name = new string(binaryReader.ReadChars(64)); 

mat.name even contained the correct line, but I could just add lines before it. All after the line just disappeared. Even with String.Format. My solution still does not use the ReadChars method, but reads the data as a byte array and converts it to a string:

 byte[] str = binaryReader.ReadBytes(64); int lengthOfStr = Array.IndexOf(str, (byte)0); // eg 4 for "clip\0" mat.name = System.Text.ASCIIEncoding.Default.GetString(str, 0, lengthOfStr); 
0
source share

All Articles