The code you provide only copies the MyData structures: the node address and integer. To be overly clear, you are copying a pointer, not data - you need to explicitly copy the data.
If the data is always LENGTH same, then you probably just want to make one large array:
float *d_data; memSize = N * LENGTH * sizeof(float); cudaMalloc((void**) &d_data, memSize);
If it should be in a structure with other data, then:
struct MyData { float data[LENGTH]; int other_data; } MyData *d_items; memSize = N * sizeof(MyData); cudaMalloc((void**) &d_items, memSize);
But I assume that you have data representing many lengths. One solution is to set the maximum length of LENGTH (and just spend some space) and then do it the same way as described above. This may be the easiest way to get started, and then you can optimize later.
If you cannot afford the lost memory and transfer time, then I will have three arrays: one with all the data, and then one with offsets and one with the length for the host and device:
//host memory float *h_data; int h_offsets[N], h_lengths[N]; //or allocate these dynamically if necessary int totalLength; //device memory float *d_data; int *d_offsets, *d_lengths; /* calculate totalLength, allocate h_data, and fill the three arrays */ //allocate device memory cudaMalloc((void**) &d_data, totalLength * sizeof(float)); cudaMalloc((void**) &d_ffsets, N * sizeof(int)); cudaMalloc((void**) &d_lengths, N * sizeof(int)); //and now three copies cudaMemcpy(d_data, h_data, totalLength * sizeof(float), cudaMemcpyHostToDevice); cudaMemcpy(d_offsets, h_offsets, N * sizeof(int); cudaMemcpyHostToDevice); cudaMemcpy(d_lengths, h_lengths, N * sizeof(int); cudaMemcpyHostToDevice);
Now in stream i you can find data that starts with d_data[d_offsets[i]] and has a length of d_data[d_lengths[i]]