How to judge the encoding of a txt file

Possible duplicate:
How to determine the encoding / encoding of a text file

I am developing a winform system. And you need to read the txt file.

Unfortunately, there are many txt-encoded files. I cannot read this using a specific encoding.

The problem is how to judge the encoding of the txt file.

+4
source share
2 answers

In speeches by @Gens and @Samuel Neff, I solve the problem. Here is my code.

public static Encoding GetFileEncoding(string srcFile) { // *** Use Default of Encoding.Default (Ansi CodePage) Encoding encoding = Encoding.Default; using (FileStream stream = File.OpenRead(fileName)) { // *** Detect byte order mark if any - otherwise assume default byte[] buff = new byte[5]; stream.Read(buff, 0, buff.Length); if (buff[0] == 0xEF && buff[1] == 0xBB && buff[2] == 0xBF) { encoding = Encoding.UTF8; } else if (buff[0] == 0xFE && buff[1] == 0xFF) { encoding = Encoding.BigEndianUnicode; } else if (buff[0] == 0xFF && buff[1] == 0xFE) { encoding = Encoding.Unicode; } else if (buff[0] == 0 && buff[1] == 0 && buff[2] == 0xFE && buff[3] == 0xFF) { encoding = Encoding.UTF32; } else if (buff[0] == 0x2B && buff[1] == 0x2F && buff[2] == 0x76) { encoding = Encoding.UTF7; } } return encoding; } 
+2
source

See this answer here:

How to determine the encoding / codepage of a text file

You cannot find the code page, you need to say so. You can analyze bytes and guess them, but this can give some strange (sometimes funny) results. I canโ€™t find it now, but Iโ€™m sure that Notepad can be fooled by displaying English text in Chinese.

and the article that she refers to:

http://www.joelonsoftware.com/printerFriendly/articles/Unicode.html

The most important fact about coding

If you completely forget everything that I just explained, remember one extremely important fact. It doesn't make sense to have a string without knowing which encoding it uses. You can no longer stick your head in the sand and pretend that the โ€œplainโ€ text is ASCII. There is no such thing as plain text.

If you have a string in memory, in a file, or in an email message, you need to know what encoding it is in, or you cannot interpret or display it correctly.

+2
source

All Articles