Creating port size_t and wchar_t?

Question

Creating port size_t and wchar_t?

As far as I understand, the representation of size_t and wchar_t is completely platform / compiler dependent. For example, I read that wchar_t on Linux is now usually 32 bits, but on Windows it is 16 bits. Is there a way that I can standardize them to a given size (int, long, etc.) in my own code, while maintaining backward compatibility with existing standard libraries and C functions on both platforms?

My goal is to make something like typedef so that they are set size. Is this possible without breaking something? Should I do this? Is there a better way?

UPDATE:. The reason I want to do this is because my string encoding is consistent for both Windows and Linux

Thanks!

+6

c ++ c size-t

Tyler Oct 6 '10 at 21:30

source share

5 answers

It looks like you are looking for the headers C99 and C ++ 0x <stdint.h> / <cstdint> . This defines types of types uint8_t and int64_t .

You can use Boost cstdint.hpp if you do not have these headers.

+6

GManNickG Oct 6 '10 at 21:36

source share

Not. The fundamental problem with trying to use typedef to “fix” the type of character is that you get what is compatible with built-in functions and wide characters on some platforms, but not on other platforms.

If you need a string format that will be the same on all platforms, you can simply choose the size and signature. Do you need unsigned 8-bit "characters" or signed 64-bit "characters"? You can have them on any platform that has an integer type of the appropriate size (not all). But they are not characters in terms of language, so do not expect that you can call them strlen or wcslen or have good syntax for literals. A string literal (well, converted) is a char* , not a signed char* or unsigned char* . A wide string literal is wchar_t* , which is equivalent to some other integer type, but not necessarily the one you want.

So, you need to select the encoding, use it internally, determine your own versions of the string functions that you need, implement them, and then convert to / from the platform encoding, if necessary, for non-string functions that accept strings. utf-8 is a decent option because most of the functions of the C line still "work", in the sense that they do something quite useful, even if it is not entirely correct.

+2

Steve jessop Oct 6 '10 at 22:15

source share

wchar_t will have a smoother wicket, possibly than size_t. You can assume the maximum size for size_t (8 bytes) and pass all the variables to this before writing to the file (or socket). Another thing to keep in mind is that you will have problems with the byte order if you are trying to write / read some kind of binary representation. Anyway, wchar_t can be utf-32 encoding on one system (I believe Linux does this) and can be UTF-16 encoding on another system (Windows does that). If you are trying to create a standard format between platforms, you will have to solve all these problems.

0

Jon trauntvein Oct 6 '10 at 21:44

source share

Just work with UTF-8 internally and convert to UTF-16 exactly at the time you pass arguments to Windows functions that require it. UTF-32 will probably never be needed. Since it is usually wrong (in the Unicode sense) to process individual characters instead of strings, working with capital or normalizing a UTF-8 string is no more difficult than a UTF-32 string.

0

R .. Oct 6 '10 at 21:47

source share

Matthew flaschen · Accepted Answer · 2010-10-06T21:35:22+0000

You do not want to override these types. Instead, you can use a typedef, such as int32_t or int16_t (signed 32-bit and 16-bit), which are part of <stdint.h> in the C standard library.

If you use C ++, C ++ 0x will add char16_t and char32_t , which are new types (not just typedef for integral types) intended for UTF-16 and UTF-32.

For wchar_t alternative is simply to use a library such as ICU , which implements Unicode in a platform-independent way. Then you can simply use the UChar type, which will always be UTF-16; you still need to be careful about fate. ICU also provides converters in UChar (UTF-16) and vice versa.

Creating port size_t and wchar_t?

More articles: