Non-English characters are not saved when rewriting text

Question

Non-English characters are not saved when rewriting text

I have a problem on the client’s site, where lines containing words like “Habit” are distorted upon exit. I am processing a text file (pulling out the selected lines and writing them to another file)

For diagnostics, I welded the problem to a file with this bad word.

The source file does not contain a specification, but .net selects it as UTF-8.

When read and written, the word ends as if it were Habitao.

The hash dump of the BadWord.txt file is as follows:

enter image description here

Copy file using this code

using (var reader = new StreamReader(@"C:\BadWord.txt")) using (var writer = new StreamWriter(@"C:\BadWordReadAndWritten.txt")) writer.WriteLine(reader.ReadLine());

., gives.,.

enter image description here

Saving reader encoding does nothing

 using (var reader = new StreamReader(@"C:\BadWord.txt")) using (var writer = new StreamWriter(@"C:\BadWordReadAndWritten_PreseveEncoding.txt", false, reader.CurrentEncoding)) writer.WriteLine(reader.ReadLine());

., gives., enter image description here

Any ideas on what is going on here, how can I process this file and save the source text?

+4

c # .net text file-io character-encoding

Binary worrier Jan 08 '13 at 11:36

source share

2 answers

You must do reader.Peek() before opening StreamWriter. This reads the first character from the file to correctly determine the encoding without changing the current position.

0

bmolsbeck May 26 '14 at 19:43

source share

Esailija · Accepted Answer · 2013-01-08T11:40:43+0000

The only way to do this is to read the file in the same encoding that it was encoded. This means that Windows-1252:

 Encoding enc = Encoding.GetEncoding(1252); string correctText = File.ReadAllText(@"C:\BadWord.txt", enc);

Non-English characters are not saved when rewriting text

More articles: