Why do gcc and NVCC (g ++) see two different structure sizes?

I am trying to add CUDA to an existing single-threaded C program that was written sometime in the late 90s.

To do this, I need to mix two languages, C and C ++ (nvcc is a C ++ compiler).

The problem is that the C ++ compiler sees the structure as a specific size, while the C compiler sees the same structure as a slightly different size. Poorly. I am really puzzled by this because I cannot find the reasons for the 4-byte mismatch.

/usr/lib/gcc/i586-suse-linux/4.3/../../../../i586-suse-linux/bin/ld: Warning: size of symbol `tree' changed from 324 in /tmp/ccvx8fpJ.o to 328 in gpu.o 

My C ++ looks like

 #include <stdio.h> #include <stdlib.h> #include "assert.h" extern "C" { #include "structInfo.h" //contains the structure declaration } ... 

and my C files look like

 #include "structInfo.h" ... 

with structInfo.h similar to

 struct TB { int nbranch, nnode, root, branches[NBRANCH][2]; double lnL; } tree; ... 

My make file looks like

 PRGS = prog CC = cc CFLAGS=-std=gnu99 -m32 CuCC = nvcc CuFlags =-arch=sm_20 LIBS = -lm -L/usr/local/cuda-5.0/lib -lcuda -lcudart all : $(PRGS) prog: $(CC) $(CFLAGS) prog.c gpu.o $(LIBS) -o prog gpu.o: $(CuCC) $(CuFlags) -c gpu.cu 

Some people asked me why I did not use another host compilation. I think the host compilation option was deprecated from 2 releases ago? In addition, he never did what he said he would do .

 nvcc warning : option 'host-compilation' has been deprecated and is ignored 
+6
source share
3 answers

GPUs require natural alignment for all data, for example. A 4-byte int must be aligned with a 4-byte boundary, and an 8-byte double or long long must have an 8-byte alignment. CUDA enforces this for host code, and also ensures that structures are as compatible as possible between the host and the code device part. The x86 CPU, on the other hand, usually does not require the data to be naturally aligned (although performance limitations may result from a lack of alignment).

In this case, CUDA needs to align the dual component of the structure with an 8-byte boundary. Since the odd number of int components is preceded by a double, padding is required for this. Switching the order of components, i.e. Enabling the double component in the first place does not help, because in the array of such structures, each structure must be aligned by 8 bytes, so the size of the structure must be a multiple of 8 bytes in order to accomplish this, which also requires filling.

To force gcc to align double points in the same way that CUDA does, pass the -malign-double flag.

+12
source

It looks like a different padding used by 2 compilers: one works with 4-byte alignment, and the other with at least 8-byte alignment. You should be able to force alignment according to compiler-specific #pragma (check your compiler’s documentation for a specific #pragma ).

+5
source

There is no guarantee that two different C compilers will use the same representation for the same type β€” unless they both comply with some external standard (ABI) that describes the representation in sufficient detail.

Most likely, the filling difference is when one compiler requires double be 4 bytes and the other requires that it be aligned by 8 bytes. Both options are great for C and C ++ standards.

You can learn more about this by printing the sizes and offsets of all members of your structure:

 printf("nbranch: size %3u offset %3u\n", (unsigned)sizeof tree.nbranch, (unsigned)offsetof(struct TB, nbranch)); /* and similarly for the other members */ 

There may be a compiler-specific way to specify a different alignment, but such methods are not always safe .

The ideal solution is to use the same compiler for C and C ++ code. C is not a subset of C ++, but as a rule, it should not be too difficult to modify existing C code, so it compiles as C ++.

Or, you may be able to change your definition of structure so that both compilers are in the same way. First, you can use the double element. This is still not guaranteed, and it may break with future versions of any compiler, but it is probably good enough.

Do not forget that at the very end of the structure there may also be a gasket; this is sometimes necessary to ensure proper alignment for arrays of structures. Take a look at sizeof (struct TB) and compare it with the size and offset of the last declared element.

Another option: Insert explicit, unused elements to force alignment. For example, suppose you have:

 struct foo { uint16_t x; uint32_t y; }; 

and one compiler puts y in 16 bits, and the other puts it in 32 bits with 16 bits of padding. If you change the definition to:

 struct foo { uint16_t x; uint16_t unused_padding; uint32_t y; }; 

then you are more likely to have x and y tags for both compilers. You still have to experiment to make sure everything is consistent.

Since the C and C ++ code will be part of the same program (right?), You don’t have to worry about things like changing the byte order. If you want to transfer the values ​​of your structure type between separate programs, say, saving them in files or transferring them over the network, you may need to define a sequential way to serialize the structure value into a sequence of bytes and vice versa.

+2
source

All Articles