Force C # use ASCII

Question

Force C # use ASCII

I am working on an application in C # and must read and write from a specific data file format. The only problem at the moment is that the format uses strictly single byte characters, and C # keeps trying to throw out Unicode when I use the write and char array (which doubles the file size, among other serious problems). I worked on modifying the code to use byte arrays, but this causes some complaints when submitting them to the tree structure and datagrid controls, and also includes transformations and much more.

I spent a bit of time searching on Google, and there seems to be no simple typedef that I can use to force char use bytes for my program, at least without causing additional complications.

Is there an easy way to force a C # .NET program to use only ASCII and not touch Unicode?

Later, I got it almost working. Using ASCIIEncoding in BinaryReader / Writers resolved most of the problems (some problems with adding an extra character to the lines, but I fixed it). I have one last problem, which is very small, but can be big: in a file, does a specific character (prints like Euro sign) convert to ? when loading / saving files. This is not a problem in the texts a lot, but if it happened in record length, it could change the size per kilobyte (not good, obviously). I think this is caused by the encoding, but if it comes from the file, why won't it come back?

The exact problem / results are as follows:

Source file: 0x80 (euro)
Encodings: ** ASCII: 0x3F (?) ** UTF8: 0xC280 (A-hat euro)

None of these results will work, since it can change anywhere in the file (if 80 is changed to 3F in an int-length record, this may be 65 * (256 ^ 3) difference). Not good. I tried using UTF-8 encoding, believing that this would fix the problem pretty well, but now it adds that second character, which is even worse.

+4

c # .net unicode ascii

ssube 18 sept. '09 at 17:36

source share

4 answers

Interactive strings in .NET are always Unicode, but it really shouldn't interest you much. If you have a specific format that you need to adhere to, then the route that you have lowered (counting it as bytes) was correct. You just need to use the System.Encoding.ASCII class to perform your conversions from string->byte[] and byte[]->string .

+5

Adam robinson 18 sept. '09 at 17:40

source share

If you have a file format that mixes text in single-byte characters with binary values, such as lengths, control characters, codepage 28591 aka Latin1 aka ISO-8859-1 is a good encoding for use.

You can get this encoding using any of the following most read:

 Encoding.GetEncoding(28591) Encoding.GetEncoding("Latin1") Encoding.GetEncoding("ISO-8859-1")

This encoding has the useful feature that byte values up to 255 are converted to unchanged Unicode characters with the same value (for example, byte 0x80 becomes the character 0x0080).

In your scenario, this may be more useful than ASCII encoding (which converts values in the range 0x80 to 0xFF to '?') Or any other regular encodings that also convert some characters in this range.

+3

Joe Jan 16 '10 at 14:24

source share

If you want this in .NET , you can use F # to create a library that supports this. F # supports ASCII strings with a byte array as the base type, see Literals (F #) (MSDN):

 let asciiString = "This is a string"B

0

Jjoos 20 sept '09 at 18:14

source share

Reed copsey · Accepted Answer · 2009-09-18T17:40:18+0000

C # (.NET) will always use Unicode for strings. This is by design.

When you read or write to your file, you can use the StreamReader / StreamWriter set to force ASCII encoding, for example:

 StreamReader reader = new StreamReader (fileStream, new ASCIIEncoding());

Then just read it with StreamReader.

The spelling is the same, just use StreamWriter.

Force C # use ASCII

More articles: