CUDA: LNK2005 error in the __device__ function used in the header file

I have a device function that is defined in the header file. The reason it is in the header file is because it is used by the global core, which should be in the header file, since it is the core of the template.

When this header file is included in 2 or more .cu files, I get LNK2005 error during linking:

FooDevice.cu.obj: LNK2005 error: "int __cdecl getCurThreadIdx (void)" (? GetCurThreadIdx @@ YAHXZ) is already defined in Main.cu.obj

Why is this error caused? How to fix it?

Here is a sample code to trigger the error above:

FooDevice.h:

#ifndef FOO_DEVICE_H #define FOO_DEVICE_H __device__ int getCurThreadIdx() { return ( ( blockIdx.x * blockDim.x ) + threadIdx.x ); } template< typename T > __global__ void fooKernel( const T* inArr, int num, T* outArr ) { const int threadNum = ( gridDim.x * blockDim.x ); for ( int idx = getCurThreadIdx(); idx < num; idx += threadNum ) outArr[ idx ] = inArr[ idx ]; return; } __global__ void fooKernel2( const int* inArr, int num, int* outArr ); #endif // FOO_DEVICE_H 

FooDevice.cu:

 #include "FooDevice.h" // One other kernel that uses getCurThreadIdx() __global__ void fooKernel2( const int* inArr, int num, int* outArr ) { const int threadNum = ( gridDim.x * blockDim.x ); for ( int idx = getCurThreadIdx(); idx < num; idx += threadNum ) outArr[ idx ] = inArr[ idx ]; return; } 

Main.cu:

 #include "FooDevice.h" int main() { int num = 10; int* dInArr = NULL; int* dOutArr = NULL; const int arrSize = num * sizeof( *dInArr ); cudaMalloc( &dInArr, arrSize ); cudaMalloc( &dOutArr, arrSize ); // Using template kernel fooKernel<<< 10, 10 >>>( dInArr, num, dOutArr ); return 0; } 
+7
source share
2 answers

Why is this error occurring?

Since you included your header in FooDevice.cu and Main.cu, where it is defined, so now you have two copies of the same function, and the linker detects this.

How to fix it?

If in foo.h

the following is indicated:
 template<typename T> __device__ T foo(T x) { return x; } 

And two .cu files that include foo.h and also contain a call, for example.

 int x = foo<int>(1); 

Then you can force foo () inline:

 template<typename T> inline __device__ T foo(T x) { return x; } 

and call:

 int x = foo<int>(1); 

This will not be announced several times.

Function templates are exempted from one definition rule and there may be more than one definition in different translation units. A full specialized function and not a template, rather a normal function, so you need to use the built-in keyword not to violate ODR if you want to put them in the header file in several translation units.

Taken from http://www.velocityreviews.com/forums/t447911-why-does-explicit-specialization-of-function-templates-cause-generation-of-code.html

See also: http://en.wikipedia.org/wiki/One_Definition_Rule

I changed your code as follows:

 inline __device__ int getCurThreadIdx() { return ( ( blockIdx.x * blockDim.x ) + threadIdx.x ); } template< typename T > __global__ void fooKernel( const T* inArr, int num, T* outArr ) { const int threadNum = ( gridDim.x * blockDim.x ); for ( int idx = getCurThreadIdx(); idx < num; idx += threadNum ) outArr[ idx ] = inArr[ idx ]; return; } 

And now it compiles. Your ad without the built-in getCurThreadIdx () violated one definition rule.

+7
source

It must be embedded. You can try adding the inline .

Maybe you can remove the unnecessary code and create a simple text example for us? Usually the problem is in the details ...

0
source

All Articles