How to read an ANSI encoded file containing special characters

I am writing a TFS validation policy that checks if our source files containing our file header are located.

My problem is that our file header contains the special character "Β©", and unfortunately some of our source files are encoded in ANSI. Therefore, if I read these files in the policy, the line will look like this: "Copyright 2009".

string content = File.ReadAllText(pendingChange.LocalItem); 

I am tired of changing the string encoding, but that does not help. So, how can I read these files, I get the correct line "Copyright Β© 2009"?

Thanks for the help!

Regards Eny

+50
c # encoding ansi
Sep 16 '09 at 10:00
source share
2 answers

Use Encoding.Default :

 string content = File.ReadAllText(pendingChange.LocalItem, Encoding.Default); 

However, you should be aware that this reads it using the default system encoding - it may not be the same as the file encoding. There is no single encoding called ANSI, but usually when people talk about "ANSI encoding," they mean Windows Code 1252 or something like what their box uses.

Your code will be more reliable if you can find out the exact coding.

+97
Sep 16 '09 at 10:16
source share

It would seem reasonable if you have such a policy that you will also have a standard command encoding. Honestly, I don’t understand why any command will use encoding other than "Unicode (UtF-8 with signature) - Codepage 65001" (with the possible exception for ASPX pages with significant non-Latin static content, but even then I can " See how it would be very useful to use UTF-8).

Assuming you still want to allow mixed encodings, then you need a way to determine which encoding of the file was saved so that you know which encoding should go through ReadAllText . It is not easy to determine from the file, but using Encoding.Default will most likely work fine. Since most likely you have only 2 encodings, VS (UTF-8 with signature) and the common ANSI encoding that your machines use (possibly Windows-1252).

Therefore, using

  string content = File.ReadAllText(pendingChange.LocalItem, Encoding.Default); 

will work. (As I see, John has already published). This works because when the UTF-8 specification is present at the beginning of the file (which is VS means the term "signature"), the inserted encoding parameter is ignored, and UTF-8 is used anyway. Therefore, when a file is saved using UTF-8, you get the correct results and where ANSI is used, you will most likely also get the correct results.

By the way, if you are processing file headers, will ReadAllLines not make something easier?

+5
Sep 16 '09 at 10:42
source share



All Articles