Access VARCHAR as BINARY while reading with Entity Framework and MySQL?

The basis of the problem:

Estimated Pun.

The problem starts with the very old dBase database, where text information is encoded directly in DOS Cyrillic (CP-866) , and because this is not enough to solve the problem, it is also transferred to the MySQL database every night, to which I have access.

I installed MySQL Providers and connected to the database using the Entity Framework, which was my main method of accessing data, and then, for experimental reasons, with pure ADO.NET.

Everything went better than expected until I tried to convert the supposedly CP-866 values ​​from the database to UTF-8, for example:

var cp866 = Encoding.GetEncoding(866); var utf8 = Encoding.UTF8; string source = "some unreadable set of characters from the database"; byte[] cp866bytes = cp866.GetBytes(source); byte[] utf8bytes = Encoding.Convert(cp866, utf8, cp866bytes); string result = utf8.GetString(utf8bytes); 

I read it once with EntityFramework and once with ADO.NET with the same result.

For reasons unknown at that time and now less unknown now , it did not work . After reading some important articles on coding and string values, I determined that it is not possible to apply such transformations to the string equivalent of a varchar field in a database due to the nature of the string variable itself.

A little later, I finally did this using the ADO.NET MySQL Provider and tweaked my query by adding CONVERT (varcharColumn, Binary) to the column I tested.

From now on, I used the code above with the only difference being that I already had an array of cp866 bytes from the converter. I originally intended to do something similar, but the MySQL provider was not able to directly read the bytes from the varchar field, and could not find a way to do this with the Entity Framework.

Yes, it works, but it is not very good for my inexperienced me.



Questions:

1: Can I specify how the Entity Framework should select specific fields?

I would like to somehow explain my favorite ORM that it should convert certain varchar fields to binary data while reading, without returning a string representation at all, because it messed up everything.

2: Is there a way to get the ADO.NET MySQL provider to receive bytes of the varchar field without first pulling it out as a string?

The GetBytes method throws an exception when used with varchar, and the GetSqlBytes method, which is usually present with the ADO.NET provider, is not available in the MySQL version. I really don't want to write Binary Convert in every field that I need to read correctly.

3: Bonus question: is it possible to read the varchar encoded field in CP-866 as a string like me, but this time correctly change the encoding to UTF-8?

There is still a lot of chaos in my head on the subject of coding after today's reading. I still believe that there might be something that I don’t see, and you can read a line from the varchar fields encoded by cp-866, for example:

 string cp866EncodedValue = "Ε’β‚¬β€žβ€Ήβ€¦ Ε’β€Ήβ‚¬β€žβ€¦Ε½β€šβ‚¬ Ε Ε‘β€¦β€šβ‚¬"; //actual copy-pasted value 

.. and then convert it to UTF-8, bearing in mind that the field in the database was encoded by CP-866. From what I read as soon as it is in a string, this unicode and string are immutable. I tried to get his view of the byre array by changing it to cp866, then to utf8, I tried to use it because it is cp866 itself, but without success.

+7
source share
1 answer

First of all , I would check the current encodings in your database and / or for your table.

@eggyal points to a link where there are the following commands for setting specific variables:

 SET character_set_client = charset_name; SET character_set_results = charset_name; SET character_set_connection = charset_name; 

To check them, use the following steps:

 SHOW VARIABLES LIKE 'character_set_client'; SHOW VARIABLES LIKE 'character_set_results'; SHOW VARIABLES LIKE 'character_set_connection'; 

Then, for the default encoding for the database, use:

 SHOW CREATE DATABASE databaseName; 

Then for this particular table check:

 show create table TABLE_IN_QUESTION; 

After these, you know exactly which encodings of your database and / or tables are calculated.


My decision to fix the problems found is just a link to an interesting source. Please see if this post has anything important to say about:

http://nicj.net/mysql-converting-an-incorrect-latin1-column-to-utf8/


PS. Yes , I can read the URL, it indicates the latin1 β†’ utf8 conversion, but for my understanding, the same tips will apply to other pairs of character encodings.

+1
source

All Articles