CublasSetVector () vs cudaMemcpy ()

I am wondering if there is a difference between:

// cumalloc.c - Create a device on the device HOST float * cudamath_vector(const float * h_vector, const int m) { float *d_vector = NULL; cudaError_t cudaStatus; cublasStatus_t cublasStatus; cudaStatus = cudaMalloc(&d_vector, sizeof(float) * m ); if(cudaStatus == cudaErrorMemoryAllocation) { printf("ERROR: cumalloc.cu, cudamath_vector() : cudaErrorMemoryAllocation"); return NULL; } /* THIS: */ cublasSetVector(m, sizeof(*d_vector), h_vector, 1, d_vector, 1); /* OR THAT: */ cudaMemcpy(d_vector, h_vector, sizeof(float) * m, cudaMemcpyHostToDevice); return d_vector; } 

cublasSetVector() has two arguments incx and incy , and the documentation says :

The storage distance between successive elements is determined by the expression incx for the source vector x and for the destination vector y.

At the NVIDIA forum, someone said:

iona_me: "incx and incy are steps measured in floats."

Does this mean that for incx = incy = 1 all elements of a float[] will be sizeof(float) -licensed, and for incx = incy = 2 will be sizeof(float) -packing between each element?

  • Except for these two parameters and cublasHandle - does cublasSetVector() do anything else that cudaMalloc() does not?
  • Is it possible to save a vector / matrix that was not created with their corresponding cublas*() function for other CUBLAS functions to manage them?
+7
cuda cublas
source share
1 answer

In the NVIDIA Forum thread section , there is a comment provided by Massimiliano Fatica confirming my expression in the above comment (or, to put it better, my comment arose from recalling a read post that I am associated with). In particular,

cublasSetVector , cubblasGetVector , cublasSetMatrix , cublasGetMatrix - thin wrappers around cudaMemcpy and cudaMemcpy2D . Therefore, no significant performance differences are expected between the two sets of copy functions.

Accordingly, you can safely pass any array created using cudaMalloc as input to cublasSetVector .

Regarding successes, there may be a typo in the manual (as from CUDA 6.0) that says that

The storage distance between successive elements is determined by incx for the source vector x and for the target vector y .

but perhaps it should be read as

The storage distance between consecutive elements is determined by incx for the source vector x and incy for the target vector y .

+4
source share

All Articles