This is a very old problem in my work, I understand that I still do not have a good solution ...
C naively defined all of its character verification functions for int:
int isspace(int ch);
But char is often signed, and the full character often does not fit in an int or in any separate memory block that is used for ****** strings.
And these functions were the logical template for the current C ++ functions and methods and formed the basis for the current standard library. In fact, they are still supported, afaict.
So, if you pass isspace (* pchar), you may run into character extension problems. They are difficult to understand, and from there they are difficult to protect from my experience.
Similarly, since isspace () and it ilk all accept ints, but because the actual character width is often unknown without string analysis, this means that any modern character library will almost never move around char or wchar_t, but only pointers / iterators , since only by analyzing the flow of characters you can know how many of them are one logical symbol, I lose a little, how much better is it to approach the problems?
I continue to rely on a truly reliable library based on abstracting the size factor of any character and working only with strings (providing things like isspace, etc.), but either I skipped this or another simpler solution that looks to me in the face that all of you (who knows what you are doing) are using ...
** These problems do not cause fixed-size encodings that may completely contain the full character. UTF-32 is apparently the only option that has these characteristics (or specialized environments that limit themselves to ASCII or some).
So my question is:
"How do you check spaces, fingerprints, etc. so that you don't suffer from two problems:
1) Sign extension and
2) problems with variable width characters
In the end, most character encodings have a variable width: UTF-7, UTF-8, UTF-16, as well as older standards such as Shift-JIS. Even extended ASCII can have the simple task of expanding a character if the compiler treats char as a signed 8-bit block.
Note:
Regardless of the size of your char_type character, this is incorrect for most character encoding schemes.
This problem is in the standard C library, as well as in the standard C ++ libraries; which is still trying to bypass char and wchar_t, rather than string iterators in various implementations of isspace, isprint, etc.
Actually, it is precisely those types of functions that break the generality of std :: string. If he worked only in storage units and did not try to pretend that he understood the meaning of storage units as logical characters (e.g. isspace), then the abstraction will be much more honest and will force us programmers to look elsewhere for real solutions ...
Thank you
Everyone who participated. Between this discussion and WChars, encodings, standards, and portability I have a much more efficient way to troubleshoot. Although there are no simple answers, every bit of understanding helps.