Non-English characters are not saved when rewriting text

I have a problem on the client’s site, where lines containing words like “Habit” are distorted upon exit. I am processing a text file (pulling out the selected lines and writing them to another file)

For diagnostics, I welded the problem to a file with this bad word.

The source file does not contain a specification, but .net selects it as UTF-8.

When read and written, the word ends as if it were Habitao.

The hash dump of the BadWord.txt file is as follows:

enter image description here

Copy file using this code

using (var reader = new StreamReader(@"C:\BadWord.txt")) using (var writer = new StreamWriter(@"C:\BadWordReadAndWritten.txt")) writer.WriteLine(reader.ReadLine()); 

., gives.,.

enter image description here

Saving reader encoding does nothing

 using (var reader = new StreamReader(@"C:\BadWord.txt")) using (var writer = new StreamWriter(@"C:\BadWordReadAndWritten_PreseveEncoding.txt", false, reader.CurrentEncoding)) writer.WriteLine(reader.ReadLine()); 

., gives., enter image description here

Any ideas on what is going on here, how can I process this file and save the source text?

+4
source share
2 answers

The only way to do this is to read the file in the same encoding that it was encoded. This means that Windows-1252:

 Encoding enc = Encoding.GetEncoding(1252); string correctText = File.ReadAllText(@"C:\BadWord.txt", enc); 
+8
source

You must do reader.Peek() before opening StreamWriter. This reads the first character from the file to correctly determine the encoding without changing the current position.

0
source

All Articles