In our API, we use byte [] to send data over the network. Everything worked fine until the day when our "foreign" clients decided to send / receive Unicode characters.
As far as I know, Unicode characters occupy 2 bytes, however we only allocate 1 byte in the byte array.
This is how we read a character from the byte [] array:
// buffer is a byte[6553] and index is a current location in the buffer char c = System.BitConverter.ToChar(buffer, m_index); index += SIZEOF_BYTE; return c;
So, the current problem is that the API gets a weird Unicode character when I look at the Unicode hex code. I found that the last significant byte is correct, but the most significant byte matters when it should be 0. A quick workaround, so far, has been 0x00FF and c for msb filtering.
Please suggest the correct approach to working with Unicode characters coming from a socket?
Thanks.
Decision:
Kudos to John:
char c = (char) buffer [m_index];
And, as he mentioned, the reason it works is because the api client receives a character occupying only one byte, and BitConverter.ToChar uses two, therefore, the problem is in converting it. I still wonder why this worked for a certain set of characters, and not for others, as it should have failed in all cases.
Thanks guys, great answers!
Sasha
source share