Printf format for 1 byte of signed number

Question

Printf format for 1 byte of signed number

Assuming the following:

sizeof(char) = 1 sizeof(short) = 2 sizeof(int) = 4 sizeof(long) = 8

The printf format for a 2-byte signed number is %hd , for a 4-byte signed number %d , for an 8-byte signed number %ld , but what is the correct format for 1 byte, signed number?

+7

c printf

Chris Feb 07 '15 at 21:54

source share

1 answer

rici · Answer 1 · 2015-02-07T22:05:53+0000

What is the correct format for a 1 byte signed number?

%hh and the integer conversion specifier of your choice (for example, %02hhX . See C11 standard, section 7.21.6.1p5:

hh
Indicates that the following d , i , x or x conversion specifier x is used for the signed char or unsigned char argument (the argument will be raised in accordance with whole promotions, but before printing, its value must be converted to a signed char or unsigned char; & hellip;

A note in parentheses is important. Due to the whole progress on the arguments of the variational functions (for example, printf ), the function never sees the char argument. Many programmers believe this means that there is no need to use the h and hh qualifiers. Of course, you are not creating undefined behavior without leaving them, and most of the time it will work.

However, char may well be signed, and the integer share will retain its value, which will turn it into a signed integer. A printout of a signed integer with an unsigned format (for example, %02X ) will present you with the extended character F s. Therefore, if you want to display a signed char using an unsigned format, you need to tell printf what the original unpromoted width of the integer type was, using hh .

If this was not clear, a simple example (but controversial):

 /* Read the comments thread to this post; I'll remove this note when I edit the outcome of the discussion into the answer */ #include <stdio.h> int main(void) { char* s = "\u00d1"; /* Ñ */ for (char* p = s; *p; ++p) printf("%02X (%02hhX)\n", *p, *p); return 0; }

Output:

 $ ./a.out FFFFFFC3 (C3) FFFFFF91 (91)

There is (or perhaps was) a significant discussion in the comment thread about whether the above snippet is undefined, since the x format specification requires an unsigned argument, while the char argument (at least for the implementation that prepared the result presented). I think this argument depends on & sect; 7.12.6.1/p9: "If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined."

However, in the case of integer char (and short ) types, the expression in the argument list advances to int or unsigned int before the function is called. (It is worth noting that for most architectures, all three types of characters will advance to a signed int ; promotion of unsigned char (or unsigned char ) to unsigned int will only happen in the implementation, where sizeof(int) == 1 )

So, on most architectures, the argument to convert the format %hx or %hhx will be signed, and it cannot be undefined behavior without using the meaning of these format codes.

Also, the standard does not say that fprintf (and friends) somehow restore the original expression. It says that before printing, the value is converted to a signed char or unsigned char "(" 7.21.6.1/p5, quoted above, emphasis added).

Converting a signed value to an unsigned value is not undefined. It is not even defined or implementation dependent. It simply consists of (conceptually) "repeatedly adding or subtracting one greater than the maximum value that can be represented in the new type until the value is in the range of the new type." (& Section; 6.3.1.3/p2)

So, there is a clear procedure for converting an argument expression into a (possibly signed) int argument, and a well-defined procedure for converting this value to an unsigned char . Therefore, I affirm that a program such as the one above is fully defined.

To confirm, the behavior of fprintf specified by the %c format specifier is defined as follows (section 7.21.6.8/p8), emphasis is added:

the int argument is converted to unsigned char , and the resulting character is written.

If someone applied the proposed restrictive interpretation, which displays the above program undefined, then I believe that it could also be argued that:

 void f(char c) { printf("This is a '%c'.\n", c); }

was also UB. However, I think that almost every C programmer wrote something similar, without thinking twice about it.

A key part of the question is what is meant by “argument” in §7.12.6.1 / p9 (and in other parts of §7.12.6.1). The C ++ standard is a little more accurate; he indicates that if the argument is exposed to default shares, "the argument value is converted to an advanced type before the call", which, as I interpret it, means that when considering the call (for example, calling fprintf ), the arguments are now advanced values.

I do not think that C is actually different, at least in intention. It uses formulations such as “arguments & hellips are promoted” and at least in one place “argument after promotion”. In addition, in the description of variational functions (macro va_arg , section 7.16.1.1), the restriction on the type of the argument in parentheses annotates "the type of the actual next argument (in accordance with the default progress).

I agree that all this: (a) a subtle reading of an insufficiently accurate language and (b) a count of dancing angels. But I see no value in declaring that standard use, such as using %c arguments with char , is “technically” UB; that denounces the concept of UB, and it’s hard to believe that such a ban would be intentional, which would make me believe that the interpretation was not intended. (And perhaps it should be fixed editorially.)

Printf format for 1 byte of signed number

More articles: