Best way to transfer fgetc () result binding to char in C

Question

Best way to transfer fgetc () result binding to char in C

Maybe I'm thinking too much about it, as it seems like it should be a lot easier. I want to take a value of type int, for example, returned by fgetc (), and write it to the char buffer, if this is not the end of the file code. For example:.

char buf; int c = fgetc(stdin); if (c < 0) { /* handle end-of-file */ } else { buf = (char) c; /* not quite right */ }

However, if the platform has signed characters by default, then the value returned by the fgetc () function may be outside the char range, in which case casting or assigning it to a (signed) char creates a behavior defined by the implementation (right?). Of course, there are tons of code that exactly matches this example. Does everything rely on implementation-defined behavior and / or assuming 7-bit data?

It seems to me that if I want to be sure that the behavior of my code is determined by C to be what I want, then I need to do something like this:

 buf = (char) ((c > CHAR_MAX) ? (c - (UCHAR_MAX + 1)) : c);

I think it creates a specific, correct behavior if the characters are by default signed or unsigned, and even despite the size of the char. It is right? And is this really necessary to ensure portability?

+7

c io portability type-conversion

John bollinger October 8th. '13 at 14:29

source share

3 answers

In practice, it's simple - the obvious tide of char always works.
But you ask about portability ...

I don’t see how a real portable solution can work.
This is because the guaranteed char range is from -127 to 127, which is only 255 different values. So, how could you translate 256 possible fgetc return values (excluding EOF ) to char without losing information?

Best of all, I can use unsigned char and avoid char .

+3

ugoren October 8th. '13 at 15:17

source share

Thanks to those who answered, and now having read the relevant parts of the C99 standard, I came to an agreement with the somewhat surprising conclusion that storing an arbitrary value other than the EOF returned by fgetc() as a char type without loss of fidelity is not guaranteed. This is largely due to the fact that char cannot represent as many different values as unsigned char .

For its part, stdio functions ensure that if data is written to a (binary) stream and then read back, then writeback data will be compared with the original data. This, as it turned out, has much narrower consequences than I thought at first, but this means that fputs() should output a separate value for each selected char that it successfully displays, and that any fgets() conversion is used to store input bytes how char should convert the conversion exactly, if any, with which fputs() will return the input byte as its output. However, as far as I can tell, fputs() and fgets() allowed to crash on any input that they don't like, so there is no certainty that fputs () maps all possible char values to unsigned char .

In addition, although fputs() and fgets() work as if sequences of calls to fputc() and fgetc() were executed, respectively, it is not indicated what conversions they can perform between char values in memory and base unsigned char values in the stream. If the fputs() platform uses a standard integer transform for this purpose, then the correct inverse transform will be as I suggested:

 int c = fgetc(stream); char buf; if (c >= 0) buf = (char) ((c > CHAR_MAX) ? (c - (UCHAR_MAX + 1)) : c);

This arises directly from the integer conversion rules, which indicate that integer values are converted to unsigned types by adding or subtracting an integer multiple type <target type> _MAX + 1, necessary to bring the result into the range of the target type, supported by restrictions on the representation of integer types. Its correctness for this purpose does not depend on the concrete representation of char values or on whether char treated as signed or unsigned.

However, if char cannot represent as many different values as unsigned char , or if there are char values that fgets() refuses to print (for example, negative), then possible values of c possible, which could not be the result of converting char in the first place. The inverse conversion argument is not applicable to such bytes, and perhaps the meaning of the char corresponding to them does not even make sense. In any case, whether the given transformation is the correct inverse for the data written by fputs() seems to be an implementation. It is definitely determined by the implementation of whether buf = (char) c have the same effect, although it has so many systems.

In general, I am only struck by how many details of the C I / O behavior are implemented. This was a revelation for me.

+2

John bollinger Oct 9 '13 at 17:15

source share

chux · Accepted Answer · 2013-10-08T14:38:01+0000

fgetc() returns unsigned char and EOF. EOF is always <0. If the char system is signed or unsigned , it does not matter.

C11dr 7.21.7.1 2

If the end-of-file indicator for the input stream pointed to by the stream is not set and the next character is present, the fgetc function receives this character as an unsigned char , converted to int, and advances the corresponding file position indicator for the stream (if one is defined).

I have a problem that looks like 2 compliments, and implies that the range of unsigned char and char just as wide. Both of these assumptions are, of course, almost always true today.

buf = (char) ((c > CHAR_MAX) ? (c - (UCHAR_MAX + 1)) : c);

[Edit OP Comment]
Suppose fgetc() returns no more different characters than stuff-able in the range from CHAR_MIN to CHAR_MAX , then (c - (UCHAR_MAX + 1)) will be more portable, replaced with (c - CHAR_MAX + CHAR_MIN) . We do not know that (c - (UCHAR_MAX + 1)) is in the range when c is CHAR_MAX + 1 .

There may be a system that has a signed char range from -127 to +127 and an unsigned char range from 0 to 255. (5.2.4.2.1), but since fgetc() receives the character, it seems everyone should be unsigned char or , all the ready ones were limited to a smaller range of signed char before switching to unsigned char and returning this value to the user. OTOH, if fgetc() returns 256 different characters, converting to a narrow range of signed char will not be portable regardless of the formula.

Best way to transfer fgetc () result binding to char in C

More articles: