Force StandardOutputEncoding for UTF8

I am looking to parse UTF8 characters from the standard output stream of another application in my C # project. Using the default approach, characters outside the ANSI spectrum are corrupted when reading from the standard process output.

Now, according to Microsoft, I need to set StandardOutputEncoding:

If the value of the StandardOutputEncoding property is Nothing, the process uses standard standard output encoding for standard output. The StandardOutputEncoding property must be set before the process begins. Setting this property does not guarantee that the process will use the specified encoding. The application should be checked to determine which encodings the process supports.

However, try so that I can set StandardOutputEncoding to UTF8 / CP65001, the output as read when dumped to a binary file shows the same castration of foreign language characters. They always read like "?" (aka 0x3F) instead of what they should be.

I know that the assumption at this stage would be that the application whose output I am processing just does not send UTF8 output, but it is definitely not the same as when I try to dump the application output to a file from the command line after forcing the code command page of the command uppt to 65001, everything looks fine.

chcp 65001 && slave.exe > file.txt 

Thus, I know that the slave.txt application is capable of splashing out the standard output of UTF8, but I try, as I could, I cannot get StandardOutputEncoding to do the same in my C # application.

Every time I finish work with encoding in .NET, I wish I had returned to the C ++ world, everything that required more work, but was much more transparent. I am considering writing a C application to read the output of slave.txt into a text file encoded in UTF8, ready to parse C #, but for now I support this approach.

+3
source share
1 answer

The only effect that StandardOutputEncoding has no effect on the standard execution of the executable application. The only thing he does is set the StreamReader encoding, which is on top of the stdout binary stream captured from the application being launched.

This is normal for applications that initially display UTF8 or Unicode stdout, but most Microsoft utilities do not, and instead encode the results only for the console code page. The console code page is manually set using the WIN32 SetConsoleOutputCP and SetConsoleCP , and it must be forcibly redirected to UTF8 if that is what you would like to read. This must be done on the console, in which exe is executed internally, and, as far as I know, cannot be executed from the .NET host environment.

Thus, I wrote a proxy application called UtfRedirect, the source code of which I published on GitHub in accordance with the terms of the MIT license, which is designed to be created in the .NET host, and then told which exe to execute. It will install the code page for the console of the target subordinate exe, then run it and pass stdout back to the host.

An example of calling UtfRedirector:

 //At the time of creating the process: _process = new Process { StartInfo = { FileName = application, Arguments = arguments, RedirectStandardInput = true, RedirectStandardOutput = true, StandardOutputEncoding = Encoding.UTF8, StandardErrorEncoding = Encoding.UTF8, UseShellExecute = false, }, }; _process.StartInfo.Arguments = ""; _process.StartInfo.FileName = "UtfRedirect.exe" //At the time of running the process _process.Start(); //Write the name of the final slave exe to the stdin of UtfRedirector in UTF8 var bytes = Encoding.UTF8.GetBytes(application); _process.StandardInput.BaseStream.Write(bytes, 0, bytes.Length); _process.StandardInput.WriteLine(); //Write the arguments to be sent to the final slave exe to the stdin of UtfRedirector in UTF8 bytes = Encoding.UTF8.GetBytes(arguments); _process.StandardInput.BaseStream.Write(bytes, 0, bytes.Length); _process.StandardInput.WriteLine(); //Read the output that has been proxied with a forced codepage of UTF8 string utf8Output = _process.StandardOutput.ReadToEnd(); 
+4
source

All Articles