How to read Chinese text file with C #?

How can I read a Chinese text file using C #, my current code cannot display the correct characters:

try { using (StreamReader sr = new StreamReader(path,System.Text.Encoding.UTF8)) { // This is an arbitrary size for this example. string c = null; while (sr.Peek() >= 0) { c = null; c = sr.ReadLine(); Console.WriteLine(c); } } } catch (Exception e) { Console.WriteLine("The process failed: {0}", e.ToString()); } 
+4
source share
4 answers

You need to use the correct encoding for the file. Do you know what encoding is? It could be UTF-16, otherwise Encoding.Unicode, or maybe something like Big5. In fact, you should try to find out for sure, not guessing.

As mentioned in the leppie answer, the problem may also be the capabilities of the console. To find out for sure, print the Unicode character strings as numbers. See the article on debugging errors in Unicode for more information and a useful method for flushing the contents of a string.

I would also avoid using the code you are currently using to read the file line by line. Instead, use something like:

 using (StreamReader sr = new StreamReader(path, appropriateEncoding)) { string line; while ( (line = sr.ReadLine()) != null) { // ... } } 

Calling Peek () requires that the stream can search, which may be true for files, but not for all streams. Also look at File.ReadAllText and File.ReadAllLines , if that is what you want to do - they are very convenient utility methods.

+9
source

If it's simplified Chinese, it's usually gb2312, and for traditional Chinese it's usually Big5:

 // gb2312 (codepage 936) : System.Text.Encoding.GetEncoding(936) // Big5 (codepage 950) : System.Text.Encoding.GetEncoding(950) 
+4
source

Use Encoding.Unicode instead.

I think you need to change the OutputEncoding of the console to display it correctly.

+1
source

I just ran into the same problem as yours, and now I am solving it. I think the main problem will be in the txt editor. When you save text in .txt format using notepad, you can select the encoding below. The default encoding is ANSI, which does not support reading Chinese stream (depends on your computer), while Unicode works for Chinese text. Hope this helps you :)

Greetings

Ronald

0
source

All Articles