Allocate CUDA device memory for point cloud with increasing size (number of points)

I am writing a program in which I need:

  • check on every pixel in the image
  • If the test result is TRUE, I have to add a point to the point cloud.
  • if the test result is FALSE, do nothing

I have already written working code on the C ++ processor side. Now I need to speed it up using CUDA. My idea was to make some block / stream (one stream per pixel, which I assume), run the test in parallel, and if the test result is TRUE, make a stream to add a point to the cloud.

Here is my problem: How can I allocate space in the device’s memory for a point cloud (using cudaMalloc or the like) if I don’t know a priori the number of points that I will insert into the cloud?

Do I need to allocate a fixed amount of memory, and then increase it every time the point cloud reaches the limit? Or is there a way to "dynamically" allocate memory?

+4
source share
2 answers

I would like to post this as a comment as it only partially responds, but it is too long for that.

, . malloc() free() , B-16 CUDA 7.5 :

__global__ void mallocTest()
{
    size_t size = 123;
    char* ptr = (char*)malloc(size);
    memset(ptr, 0, size);
    printf("Thread %d got pointer: %p\n", threadIdx.x, ptr);
    free(ptr);
}

int main()
{
    // Set a heap size of 128 megabytes. Note that this must
    // be done before any kernel is launched.
    cudaDeviceSetLimit(cudaLimitMallocHeapSize, 128*1024*1024);
    mallocTest<<<1, 5>>>();
    cudaDeviceSynchronize();
    return 0;
}

( 2.x )

, - - "", (, ..).

, , ​​ " " . , /, .

+1

API: - malloc, Taro, ( 8 ), cudaDeviceSetLimit cudaLimitMallocHeapSize.

cudaMalloc , API .

Taro: , CPU. , . : cudaMemcpy API- , .

, API CUDA realloc.

. , : , , - . , int, .

+1

All Articles