Do I need help converting a C # string from one character encoding to another?

According to Spolsky, I cannot call myself a developer, so there is a lot of shame behind this question ...

Scenario: In a C # application, I would like to take a string value from SQL db and use it as a directory name. I have a secure (SSL) FTP server on which I want to set the current directory using the string value from the database.
Problem: Everything works fine until I hit the string value with a special character - I seem to be unable to correctly encode the directory name to satisfy the FTP server.

Code example below

  • uses the "special" character é as an example
  • uses WinSCP as an external application for ftps comms
  • does not display all the code needed to configure the _winscp process.
  • sends commands to exex WinSCP, recording the standardinput process
  • for simplicity, it does not receive information from the database, but instead just declares a string (but I did .Equals to confirm that the value from the database matches the declared string)
  • makes three attempts to set the current directory on the FTP server using different string encodings - all of which do not work
  • trying to set a directory using a string created from a byte array manually, which works
Process _winscp = new Process(); byte[] buffer; string nameFromString = "Sinéad O'Connor"; _winscp.StandardInput.WriteLine("cd \"" + nameFromString + "\""); buffer = Encoding.UTF8.GetBytes(nameFromString); _winscp.StandardInput.WriteLine("cd \"" + Encoding.UTF8.GetString(buffer) + "\""); buffer = Encoding.ASCII.GetBytes(nameFromString); _winscp.StandardInput.WriteLine("cd \"" + Encoding.ASCII.GetString(buffer) + "\""); byte[] nameFromBytes = new byte[] { 83, 105, 110, 130, 97, 100, 32, 79, 39, 67, 111, 110, 110, 111, 114 }; _winscp.StandardInput.WriteLine("cd \"" + Encoding.Default.GetString(nameFromBytes) + "\""); 

UTF8 encoding changes é to 101 (decimal), but I don’t like the FTP server.

ASCII encoding changes from é to 63 (decimal), but I don't like the FTP server.

When I represent é as the value 130 (decimal), the FTP server is happy, except that I cannot find a method that will do this for me (I had to manually configure the string from explicit bytes).

Does anyone know what to do with my string in order to encode é as 130 and make the FTP server happy and finally bring me up to level 1 by the developer explaining the only thing the developer needs to understand?

+6
c # character-encoding ftps
source share
2 answers

130 is not ASCII (ASCII - only 7 bits - see the Encoding.ASCII documentation - so it hits "é" in the normal "?" Because it has nothing better to do). UTF-8 actually encodes a character in two bytes (decimal: 195 and 169), but retains the code point.

Use the code page explicitly, for example Latin (CP 1252) - you need to map any other side. As shown below, the output does not have "130", so ... not the required encoding :-) But the same thing: use the encoding for a specific code page.

Change As explained by Hans Passant in the commentary, the MS-DOS code page (CP 437) used here will lead to the desired results.

 // LINQPad -- Encoding is System.Text.Encoding var enc = Encoding.GetEncoding(1252); string.Join(" ", enc.GetBytes("Sinéad O'Connor")).Dump(); // -> 83 105 110 233 97 100 32 79 39 67 111 110 110 111 114 

See http://msdn.microsoft.com/en-us/goglobal/bb688114 for details.

Happy coding.

Btw. a good choice in artists - if it were intentionally: p

+4
source share

I think the problem here is that ALL the .NET string is in Unicode. There is no "what encoding I am" in .NET strings. Therefore, using Encoding.ASCII.GetString(buffer) , you convert your "string" in ASCII back to Unicode.

I think your problem should be solved by changing the encoding for Process.StandardInput, so you get the correct encoding inside WinSCP.

OR

You should check what Encoding.Default , because I'm sure it is not UTF8 or ASCII.

+1
source share

All Articles