How to do deep copy structure with CUDA?

Programming with CUDA I had a problem trying to copy some data from the host to gpu.

I have 3 nested structures like these:

typedef struct { char data[128]; short length; } Cell; typedef struct { Cell* elements; int height; int width; } Matrix; typedef struct { Matrix* tables; int count; } Container; 

So Container "includes some Matrix elements, which, in turn, contain some Cell elements.

Suppose I dynamically allocate host memory this way:

 Container c; c.tables = malloc(20 * sizeof(Matrix)); for(int i = 0;i<20;i++){ Matrix m; m.elements = malloc(100 * sizeof(Cell)); c.tables[i] = m; } 

That is, a container of 20 matrices with 100 cells each.

  • How can I now copy this data to the device memory using cudaMemCpy ()?
  • Is there a good way to do a deep copy of "struct of struct" from host to device?

Thank you for your time.

Andrea

+7
source share
1 answer

The short answer is "just don't." There are four reasons why I say that:

  • No deep copy functionality in API
  • The resulting code that you will need to create and copy the structure that you described on the GPU will be ridiculously complex (about 4000 API calls at least and, possibly, an intermediate core for your example 20 matrices of 100 cells)
  • GPU code using three levels of pointer to pointer will significantly increase memory access latency and break the small amount of cache coherence available on the GPU.
  • If you want to copy the data back to the host after that, you will have the same problem otherwise

Consider using linear memory and indexing instead. It is transferred between the host and the GPU, and the distribution and copying costs are about 1% of the pointer-based alternative.

If you really want to do this, leave a comment, and I will try to dig out some old code examples that show that the GPU has full dumb pointers.

+3
source

All Articles