What is the best practice for creating libraries supporting both Unicode and ASCII in C ++?

I am working on writing some libraries that will be used both internally and for clients, and wondered what is the best way to support both Unicode and ASCII. It looks like Microsoft (in MFC Libraries) writes both Unicode and ASCII classes and does something similar to this in header files using macros:

#ifdef _UNICODE #define CString CStringW #else #define CString CStringA #endif 

While I'm not a big fan of macros, he does the job. If I write libraries using STL, it makes sense to write headers containing things like this:

 #ifdef _UNICODE #define GetLastErrorString GetLastErrorStringW #else #define GetLastErrorString GetLastErrorStringA #endif std::string GetLastErrorStringA(); std::wstring GetLastErrorStringW(); 

Or should I just release separate libraries, one for ASCII and one for Unicode?

Itโ€™s just interesting what people think, this is the best thing to do in this situation.

UPDATE: addressing some comments and questions:

  • These will be C ++ class libraries.
  • I believe that I will need to use UTF-16 encoding as I would like to support Asian character sets.
  • My reasons for implementing Unicode are twofold: 1) All new SDKs support Unicode, and I'm not sure if future SDKs or third-party libraries will support separate versions of ASCII in the future. 2) Although we will not fully internationalize our application, it would be nice if we could handle user input (for example, names) and upload files from paths containing Asian characters.
+4
source share
4 answers

I would make the library completely Unicode inside. Then for ASCII there will be a set of C ++ adapter classes that will be used to implement Unicode.

+4
source

You can store unicode strings in std :: string if you convert them to UTF-8 first.

You only need wstring when interacting with UTF-16 calls, such as the Windows API. If so, you can convert your lines to wstrings locally where necessary. It may be a little burdensome, but it is not so bad.

+1
source

The question is a little inaccurate, but ...

First you need to clarify the encoding. Unicode is simply a representation of characters (each of which is associated with a code point), when it comes to Unicode in an application, you have to choose how code pages will be displayed. If you can go with Utf-8, you donโ€™t have to worry about wide characters, you can store data in a regular std :: string :)

Then you should clarify your problem:

  • Do you want to support writing in Unicode and Ascii?
  • or are you talking about getting out?
  • Anyway, you can use std :: locale to find out what encoding you should output to?

I am working on an internationalized application (website with C ++ backend ...) and we just use std :: string internally. The result in Ascii or Utf-8 depends on the translation file, but the presentation of the data does not depend on iota (except for character counting, see my post on this topic).

Indeed, I'm definitely not a fan of macros, since utf-8 had to be compatible with Ascii, if you can choose your own encoding, you saved!

0
source

I think you are asking for "clarity" of the code, and not using ASCII, UTF-8, 16 or 32 bit characters.

If so, I prefer to make the code blocks as large as possible: so that it can use the "gate" (symbolic constant _UNICODE) to select either individual files, or at least large pieces of code. A code that changes its spots in any other line, or, or, God forbid, in a statement, is difficult to understand.

I would advise against using gates to select individual file inclusions

 #ifdef _UNICODE #include "myUniLib.h" #else #include "myASCIILib.h" #endif 

as such entails two, or maybe three files (a Unicode file, a 646US (ASCII) file, and possibly your nexus file with the above code). This is three times the likelihood that something is lost, as well as the failure of the assembly.

Instead, use the gate in the file to select large blocks of code:

 #ifdef _UNICODE ...lotsa code... #else ...lotsa code... #endif 

OK, say you do the opposite: wondering about char versus char (UTF-8) versus W versus A. How versatile do you want to be? The CStrings you mention refers only to the Windows world. If you want to be compatible with Mac and UNIX (OK, Linux), you are on a rough trip.

BtW- ASCII ... not ... a recognized standard, anymore. There ASCII, and then there ... ASCII. If you mean seven bits of โ€œstandardโ€ from the old days of UNIX, the closest I found is ISO-646US. The unicode equivalent is ISO-10646.

Some people are lucky with character encoding in the form of URLs: just ASCII letters and numbers and a percent sign. Although you have to encode and decode all the time, the storage is really predictable. A bit strange, yes, but definitely innovative.

There are some language traps. For example, regardless of whether the case is bidirectional (I donโ€™t know the right word, here). In Deutsch, lowercase รŸ becomes SS when converted to uppercase. SS, however, when the lower shell, morphs to ss, not รŸ. Turkish has something like that. When developing your application, do not let case translations help you.

Also, remember that grammatical ordering is different in different languages. Just: "Hi Jim! How is your Monday going?" might end up saying "Hello yours, monday, is everything going well, Jim?"

Finally, a warning: avoid the IO stream (std :: cin <and std :: cout โ†’). This forces you to implement your message generators in such a way that their localization becomes very complex.

You are asking the right questions. You have ahead ahead! Best!

0
source

All Articles