Is this (char *) & x cast behavior correct?

When writing some C code, I ran into a small problem when I had to convert a character to a "string" (the part of memory whose beginning is given by the char* pointer).

The idea is that if some sourcestr pointer (not NULL ) is set, I must use it as my "final string", otherwise I must convert the given charcode to the first character of another array, and use it instead.

For the purposes of this question, we will assume that the types of variables cannot be changed in advance. In other words, I cannot just save my charcode as const char* instead of int .

Since I tend to be lazy, I thought to myself: "Hey, could I just use the address of the character and treat this pointer as a string?". Here is a small snippet of what I wrote (has not yet hit my head against the wall!):

 int charcode = FOO; /* Assume this is always valid ASCII. */ char* sourcestr = "BAR"; /* Case #1 */ char* sourcestr = NULL; /* Case #2 */ char* finalstr = sourcestr ? sourcestr : (char*)&charcode; 

Now, of course, I tried, and, as I expected, it works. Even with a few warning flags, the compiler is still happy. However, I have this strange feeling that this is actually undefined behavior, and that I simply shouldn't do that.

The reason I think so is because char* arrays must be terminated with a null character in order to print correctly as strings (and I want it to be!). However, I'm not sure if the value in &charcode + 1 will be zero, so I could get some buffer overflow madness.

Is there an actual reason why it works correctly, or was I just lucky to get zeros in the right places when I tried?

(Note that I am not looking for other ways to achieve the conversion. I could just use the variable char tmp[2] = {0} and put my character at index 0. I could also use something like sprintf or snprintf , with provided that I’m careful enough with buffer overflows. There are many ways to do this, I’m just interested in the behavior of this particular cast operation.)

Edit: I have seen several people call this hacker, and let it be clear: I completely agree with you. I don't have enough masochist to do this in the released code. This only makes me curious;)

+6
source share
4 answers

This is absolutely undefined behavior for the following reasons:

  • Less likely, but it should be taken into account when strictly referring to standards: you cannot assume sizeof int on the machine / system where the code will be compiled
  • As above, you cannot accept the code. For instance. What happens on the machine / EBCDIC system?
  • It's easy to say that your machine has a small processor. On computers with a large entent, the code crashes due to a capital letter layout.
  • Since in many systems a char is a signed integer equal to int when your char is a negative value (i.e. char>127 on machines with 8 bits of char ), it may fail due to the sign extension if you set the value as in the code below

code:

 char ch = FOO; int charcode = ch; 

PS Example 3: your line will be really NULL terminated on a small destination machine with sizeof(int)>sizeof(char) and char with a positive value , since the MSB for int will be 0, and the memory layout for such endianess is LSB-MSB (LSB first )

0
source

Your code is well defined since you can always use char* . But some problems:

  • Note that "BAR" is a lite const char* literal, so do not try to modify the contents. It will be undefined.

  • Do not try to use (char*)&charcode as a parameter for any of the string functions in the C standard library. It will not end with zero. So, in this sense, you cannot treat it like a string.

  • Arithmetic of a pointer to (char*)&charcode will be valid up to one charcode scanner and includes it. But do not try to dereference any pointer outside of charcode . The range n for which the expression (char*)&charcode + n depends on sizeof(int) .

+5
source

Listing and purpose are defined, char* finalstr = (char*)&charcode; .

Print finalstr with printf as a string, %s if it indicates the charcode behavior is undefined.

Instead of resorting to hacking and hiding the string in int type, convert the values ​​stored in an integer to a string using the selected conversion function. One possible example:

 char str[32] = { 0 }; snprintf( str , 32 , "%d" , charcode ); char* finalstr = sourcestr ? sourcestr : str; 

or use any other (specific!) transform you like.

+3
source

Like others, this suggests that this is happening because the internal int representation on your machine is a bit endian, and your char is less than int. Also, the ascii value of your character is either below 128, or you have unsigned characters (otherwise there will be a sign extension). This means that the character value is in the lower byte (s) of the int representation, and the rest of the int will be zero (assuming any normal int representation). You are out of luck, you have a pretty normal car.

It is also fully undefined behavior to indicate that char is a pointer to any function that expects a string. Now you can get away from it, but the compiler can optimize this for something completely different.

For example, if you execute printf immediately after this assignment, the compiler may assume that you will always pass a valid string to printf , which means that checking for sourcestr is NULL is not necessary, because if sourcestr was NULL printf , it would be called with what something that is not a string, and the compiler may assume that undefined behavior never occurs. This means that any sourcestr check is NULL before or after this assignment, is not needed, since the compiler already knows that it is not NULL. This assumption is allowed to spread throughout your code.

It was rarely a worry, and you could leave with tricks uglier than that, until ten years ago or so, when the compiler authors started an arms race about how much they could follow the C standard in a letter to leave with more and more cruel optimizations. Compilers are becoming more and more aggressive today, and while the optimization I was thinking about probably doesn't exist yet, if the compiler person sees this, they probably only implement it because they can.

+2
source

All Articles