I am currently working on a hobby project (C / C ++) that should work on both Windows and Linux, with full Unicode support. Unfortunately, Windows and Linux use different encodings, which complicates our lives.
In my code I try to use the data as universal as possible, which simplifies the work for both Windows and Linux. On Windows, wchar_t is encoded as UTF-16 by default, and as UCS-4 on Linux (correct me if I'm wrong).
My software opens ({_wfopen, UTF-16, Windows}, {fopen, UTF-8, Linux}) and writes the data to files in UTF-8. So far, all this can be done. So far I have not decided to use SQLite.
The SQLite C / C ++ interface allows the use of strings with one or two bytes ( click ). Of course, this does not work with wchar_t on Linux, since wchar_t on Linux is 4 bytes by default. Therefore, writing and reading from sqlite requires a conversion for Linux.
Currently, the code is cluttered with exceptions for Windows / Linux. I was hoping to stick with the standard idea of ββstoring data in wchar_t:
- wchar_t on Windows: Filepaths no problem, read / write in sqlite no problem. Writing data to a file should be done in UTF-8 anyway.
- wchar_t on Linux: exception for file paths due to UTF-8 encoding, conversion before reading / writing to sqlite (wchar_t) and the same for windows when writing data to a file.
After reading ( here ) I made sure that I should stick with wchar_t on Windows. But after all this worked out, the problem started with porting to Linux.
I'm currently going to redo all of this with a simple char (UTF-8), because it works with both Windows and Linux, given the fact that I need "WideCharToMultiByte" to reach UTF-8 every line in Windows. Using simple char * strings will significantly reduce the number of exceptions for Linux / Windows.
Do you have any experience with cross-platform Unicode? Any thoughts on the idea of ββsimply storing data in UTF-8 instead of using wchar_t?
linux windows cross-platform unicode wchar-t
Erikkou
source share