Personally, I am sorry that char does not exist and that instead of char , wchar and dchar we had something like utf8 , utf16 and utf32 . Then everyone would immediately have to realize that char not something that should be used for individual characters, but thatβs not how it was. I would say that it is almost certain that char was just taken from C / C ++, and then others were added to improve Unicode support. After all, with char there is nothing fundamentally wrong. Just so many programmers mistakenly understand that char always a symbol (which is not necessarily true even in C / C ++). But Walter Bright is very well versed in Unicode and seems to think everyone else should, too, so he seeks to make decisions regarding Unicode that work very well if you understand Unicode but don't work very well if you don't ( and most programmers do not). D pretty much forces you to come to at least a basic understanding of Unicode, which is not so bad, but it causes some people.
But the reality is that while it makes sense to use dchar for individual characters, it usually doesn't make sense to use it for strings. Sometimes what you need, but UTF-32 requires more space than UTF-8. This can affect performance and definitely affect the memory size of your programs. And a lot of string processing generally does not require random access. Thus, having UTF-8 strings by default makes much more sense than the default values ββfor UTF-32 strings.
The string management method in D usually works very well. It's just that the char name has the wrong connotation for many people, and the language, unfortunately, prefers the default character literals to be char rather than dchar in many cases.
I think a pretty convincing argument might be that this possible gain is offset by problems experienced by the same developers when they try something non-trivially with a char or string and expect it to work the way it would in C / C ++, only so that it does not work with difficult to debug methods.
The reality of the question is that strings in C / C ++ work the same as in D, only they do not protect you from ignorance or stupidity, unlike D. char in C / C ++ is always 8 bits and usually it is processed by the OS as UTF-8 (at least in * nix land - Windows does strange things for encoding for char and usually requires wchar_t for Unicode). Of course, any Unicode strings that you have in C / C ++ are in UTF-8 unless you explicitly use a string type that uses a different encoding. std::string and C strings work on code modules, not code points. But the average C / C ++ programmer treats them as if each of their elements was a whole character, which is simply wrong if you only use ASCII, and on this day and age, this is often a very bad assumption.
D takes the path of actually creating proper Unicode support into the language and into its standard library. This forces you to come to at least a basic understanding of Unicode and often makes it difficult to ruin it by providing those who really understand its extremely powerful Unicode string management tools not only correctly, but also efficiently. C / C ++ simply tries to solve this problem and allows programmers to step on Unicode land mines.
Jonathan m davis
source share