After a very hectic and confusing morning, we found the answer to this problem.
The key point that we lacked, which made it very confusing, was that string types are always encoded in 16-bit (2-byte) Unicode . This means that when we do GetString () in bytes, they are automatically transcoded to Unicode behind the scenes , and we are no better than we were in the first place.
When we started to receive typical errors and data with a double byte on the other end, we knew that something was wrong, but at first glance at the code that we had, we did not see anything bad. Having learned what we explained above, we realized that we need to send an array of bytes if we want to keep the encoding. Fortunately, MicrosoftFunc () had an overload that could take a byte array instead of a string. This meant that we could convert the unicode string to the encoding of our choice, and then send it exactly as we expect. The code has changed to:
// Convert from a Unicode string to an array of bytes (encoded as UTF8). byte[] source = Encoding.UTF8.GetBytes(unicode); // Send the encoded byte array directly! Do not send as a Unicode string. MicrosoftFunc(source);
Summary:
So, in conclusion, from the foregoing it can be seen that:
- GetBytes (), among other things, does Encoding.Convert () from Unicode (since strings are always Unicode) and the specified encoding from which the function was called, and returns an array of encoded bytes.
- GetString (), among other things, makes Encoding.Convert () from the specified encoding called by the function into Unicode (since strings are always Unicode) and returns it as a string object.
- Conversion () actually converts the byte array of one encoding to another byte array of another encoding. Obviously, strings cannot be used (since strings are always Unicode).
Ryall
source share