C # MemoryStream Encoding Vs. Encoding.GetChars ()

Question

C # MemoryStream Encoding Vs. Encoding.GetChars ()

I am trying to copy a byte stream from a database, encode it and finally display it on a web page. However, I notice different behaviors that encode content in different ways (note: I use the “Western European” encoding with the Latin character set and do not support Chinese characters):

var encoding = Encoding.GetEncoding(1252 /*Western European*/); using (var fileStream = new StreamReader(new MemoryStream(content), encoding)) { var str = fileStream.ReadToEnd(); }

Vs.

 var encoding = Encoding.GetEncoding(1252 /*Western European*/); var str = new string(encoding.GetChars(content));

If the content contains Chinese characters than the first block of code, you will get a string like "D $ 教学而设计的", which is wrong, since the encoding should not support these characters, and the second block will create "D $ æ • ™ å | è € Œè®¾è®¡çš "", which is correct because they are all in the Western European character set.

What is the explanation for this difference in behavior?

+7

c # character-encoding streamreader

Sidawy Nov 02 '12 at 13:54

source share

1 answer

SLaks · Accepted Answer · 2012-11-02T13:59:55+0000

The constructor of StreamReader will search for specifications in the stream and set its own encoding from them, even if you pass another encoding.

It sees the UTF8 specification in your data and uses UTF8 correctly.

To prevent this behavior, pass false as the third parameter:

 var fileStream = new StreamReader(new MemoryStream(content), encoding, false)

C # MemoryStream Encoding Vs. Encoding.GetChars ()

More articles: