As you wrote correctly, you need to specify the size of the dynamically allocated shared memory before each kernel call in the execution configuration (in <<<blocks, threads, sizeofSharedMemoryinBytes>>> ). This indicates the number of bytes in shared memory that is dynamically allocated to a block for this call in addition to statically allocated memory. IMHO there is no access to memory such as a 2D array, you need to use a 1D array and use it as a 2D. Think, don't forget the extern qualifier. Therefore, your code should look like this:
sizeofSharedMemoryinBytes = dimX * dimY * sizeof(float); myKernel<<<blocks, threads,sizeofSharedMemoryinBytes>>>(); .... __global__ void myKernerl() { extern __shared__ float sData[]; ..... sData[dimX * y + x] = ... }
source share