The basis of the problem:
Estimated Pun.
The problem starts with the very old dBase database, where text information is encoded directly in DOS Cyrillic (CP-866) , and because this is not enough to solve the problem, it is also transferred to the MySQL database every night, to which I have access.
I installed MySQL Providers and connected to the database using the Entity Framework, which was my main method of accessing data, and then, for experimental reasons, with pure ADO.NET.
Everything went better than expected until I tried to convert the supposedly CP-866 values ββfrom the database to UTF-8, for example:
var cp866 = Encoding.GetEncoding(866); var utf8 = Encoding.UTF8; string source = "some unreadable set of characters from the database"; byte[] cp866bytes = cp866.GetBytes(source); byte[] utf8bytes = Encoding.Convert(cp866, utf8, cp866bytes); string result = utf8.GetString(utf8bytes);
I read it once with EntityFramework and once with ADO.NET with the same result.
For reasons unknown at that time and now less unknown now , it did not work . After reading some important articles on coding and string values, I determined that it is not possible to apply such transformations to the string equivalent of a varchar field in a database due to the nature of the string variable itself.
A little later, I finally did this using the ADO.NET MySQL Provider and tweaked my query by adding CONVERT (varcharColumn, Binary) to the column I tested.
From now on, I used the code above with the only difference being that I already had an array of cp866 bytes from the converter. I originally intended to do something similar, but the MySQL provider was not able to directly read the bytes from the varchar field, and could not find a way to do this with the Entity Framework.
Yes, it works, but it is not very good for my inexperienced me.
Questions:
1: Can I specify how the Entity Framework should select specific fields?
I would like to somehow explain my favorite ORM that it should convert certain varchar fields to binary data while reading, without returning a string representation at all, because it messed up everything.
2: Is there a way to get the ADO.NET MySQL provider to receive bytes of the varchar field without first pulling it out as a string?
The GetBytes method throws an exception when used with varchar, and the GetSqlBytes method, which is usually present with the ADO.NET provider, is not available in the MySQL version. I really don't want to write Binary Convert in every field that I need to read correctly.
3: Bonus question: is it possible to read the varchar encoded field in CP-866 as a string like me, but this time correctly change the encoding to UTF-8?
There is still a lot of chaos in my head on the subject of coding after today's reading. I still believe that there might be something that I donβt see, and you can read a line from the varchar fields encoded by cp-866, for example:
string cp866EncodedValue = "Εβ¬ββΉβ¦ ΕβΉβ¬ββ¦Ε½ββ¬ Ε Ε‘β¦ββ¬";
.. and then convert it to UTF-8, bearing in mind that the field in the database was encoded by CP-866. From what I read as soon as it is in a string, this unicode and string are immutable. I tried to get his view of the byre array by changing it to cp866, then to utf8, I tried to use it because it is cp866 itself, but without success.