UTF8 coding process c #

I have an application that processes vbscript and produces the result.

private static string processVB(string command, string arguments) { Process Proc = new Process(); Proc.StartInfo.UseShellExecute = false; Proc.StartInfo.RedirectStandardOutput = true; Proc.StartInfo.RedirectStandardError = true; Proc.StartInfo.RedirectStandardInput = true; Proc.StartInfo.StandardOutputEncoding = Encoding.UTF8; Proc.StartInfo.StandardErrorEncoding = Encoding.UTF8; Proc.StartInfo.FileName = command; Proc.StartInfo.Arguments = arguments; Proc.StartInfo.WindowStyle = ProcessWindowStyle.Hidden; //prevent console window from popping up Proc.Start(); string output = Proc.StandardOutput.ReadToEnd(); string error = Proc.StandardError.ReadToEnd(); if (String.IsNullOrEmpty(output) && !String.IsNullOrEmpty(error)) { output = error; } //Console.Write(ping_output); Proc.WaitForExit(); Proc.Close(); return output; } 

I think that I have correctly configured everything related to the encoding property. processVB will receive the command as a VBscript file and its arguments.

The C # processVB process that processes this VBScript file now produces the result as follows.

"?"

But I have to get the source

"äåéö €"

I set the encoding correctly. But I am not able to get it right.

What am I doing wrong?

+6
source share
5 answers

This answer does not answer a direct question - but I noticed a deadlock potential in your code, and thus thought it would be worthy to publish it anyway.

The potential for deadlocks exists because your code tries to synchronously read from redirected output and does this for both: StdOut and StdErr. That is this section of code.

 Proc.Start(); string output = Proc.StandardOutput.ReadToEnd(); string error = Proc.StandardError.ReadToEnd(); ... Proc.WaitForExit(); 

What can happen is that the child process writes a lot of data to StdErr and fills the buffer. When the buffer is full, the child process blocks writing to StdErr (without signaling, but terminating the StdOut stream). And so the child is blocked and does nothing, and your process is blocked, waiting for the child to exit. Dead end!!!

To fix this, at least one (or the best of both) threads should be switched to asynchronous mode.

See the second example on MSDN for specifics of this scenario and how to switch to asynchronous mode.

Regarding the UTF-8 problem, are you sure that your child process is output in this encoding and does not speak in UTF-16 or any other? You might want to examine bytes to try and cancel which encoding stream will be provided so that you can set the correct encoding for interpreting the redirected stream.

EDIT

Here is how I can solve the encoding problem. The main idea is based on what I once needed to do - I had Russian text in an unknown encoding, and I needed to figure out how to convert it so that it displayed the correct characters - take the bytes captured from StdOut and try them decode using all known code pages available in the system. The one that looks right is most likely (but not necessarily) encoded with StdOut encoding. The reason it is not guaranteed, even if it looks right with your data, is because many encodings overlap in some byte ranges, which will make it work the same. For instance. ASCII and UTF8 will have the same bytes when encoding basic Latin characters. Thus, in order to get an exact match, you may need to be creative and tested with some atypical text.

Here is the basic code for this - corrections may be required:

  byte[] text = <put here bytes captured from StandardOut of child process> foreach(System.Text.EncodingInfo encodingInfo in System.Text.Encoding.GetEncodings()) { System.Text.Encoding encoding = encodingInfo.GetEncoding(); string decodedBytes = encoding.GetString(bytes); System.Console.Out.WriteLine("Encoding: {0}, Decoded Bytes: {1}", encoding.EncodingName, decodedBytes); } 

Run the code and manually view the output. All those that match the expected text are candidates for the coding used by StdOut.

+3
source

The problem is that the console is not UTF-8 by default. It works on the same code page as your language preferences in Windows. An easy way to solve this problem is to use the chcp console command. Example:

 chcp 65001 && yourScript.vbs 

This will cause the output to be in UTF-8 and make sure you can read it correctly from your .NET application.

Note that I tested this with bat script instead of VB-script, but if VB-script supports UTF-8, it should work fine. In addition, you may need to explicitly call the VB-script execution engine, not just yourScript.vbs . But you should be able to easily resolve this yourself.

+1
source

Since the output that VBScript generates is UTF8

This assumption that you are in trouble here is simply not utf-8. And it cannot be, the script engine does not support customization. Something you can try for yourself, use this statement in the .vbs sample file:

  SetLocale 65001 

Kaboom, it accepts only LCID values ​​and does not cover utf encodings. Instead, the cscript.exe scripting script itself already modifies the default code page. Instead of the standard OEM code page (HKEY_LOCAL_MACHINE \ SYSTEM \ ControlSet \ Control \ Nls \ CodePage \ OEMCP), it switches to the default Windows code page. The ACP value in the above documented registry key. Depending on your location, it will be 1252, for example, in the Americas and Western Europe.

Some VBScript code to play, be sure to save the file with the default encoding suitable for your language, or the script interpreter itself will not correctly interpret the lines in the source code. Which in itself may also explain your problem:

 WScript.Echo "Locale: " & GetLocale WScript.Echo "äåéö€" WScript.Echo "Changing locale to US-English:" SetLocale 1033 WScript.Echo "äåéö€" 

Output on my machine:

 C:\temp>cscript test.vbs Microsoft (R) Windows Script Host Version 5.8 Copyright (C) Microsoft Corporation. All rights reserved. Locale: 1033 äåéö€ Changing locale to US-English: äåéö€ 

Thus, the correct line of code in your program should be:

 Proc.StartInfo.StandardOutputEncoding = Encoding.Default; 

Note that this is not the default that the Process class uses, it is assumed that the console mode program uses the OEM code page. Like 437 by car in North America and Western Europe. You can select a different LCID in your .vbs program and change your C # code to match, but this is not necessary.

And keep the crash mode in the wrong encoding of the .vbs source file. Unfortunately, the script engine does not support utf-8 with the specification.

+1
source

Another process (vbscript) generates and outputs in some encoding. By setting standard encoding, you tell the system how to read this stream. This will not change the encoding performed by another process.

So, you need to find out the exact coding used by another process (VBScript). To do this, I run the script directly from the shell and redirect the output to a file and open it in a tool that shows the encoding (for example, notepad2). And if I'm right, it will be something other than UTF8.

Then you set this encoding for ProcStartInfo.StandardOutputEncoding in your code, and then everything should work.

0
source

I use your function as follows:

 label1.Text = processVB("wscript.exe", "c:\\s.vbs"); 

And my vbs file

 Set fso = CreateObject ("Scripting.FileSystemObject") Set stdout = fso.GetStandardStream (1) stdout.WriteLine "äåéö€" 

My vbs file is encoded as UTF-8 without specification

And it works as expected. I see äåéö€ in my uniform.

Perhaps you should change the way you use your function, the encoding of your vbs file, and the way you output data to standard output.

0
source

All Articles