Probably the only way to do what you ask - does this apply axpy with the unit vector of the same size, scaled by a constant that you want to add.
Thus, the operation becomes X <- X + alpha * I , which is equivalent to adding alpha to each entry in X .
EDIT:
From the comments, it seems that you have provided some difficulties with creating a unit vector for calling SAXPY. One way to do this - use memset call to set the values โโof a unit vector on a device like this:
#include "cuda.h" , sz); #include "cuda.h"
Note. Here, I allocated and copied memory for CUBLAS vectors using the CUDA API, rather than using the CUBLAS helper functions (which in any case are very thin shells around the runtime APIs). The "difficult" part creates a bit pattern and uses the driver API function to set each 32-bit word in the array.
You can equally accomplish all of this with a pair of pattern lines of code from the thrust of the library, or just write your own kernel, which can be as simple as
template<typename T> __global__ void vector_add_constant( T * vector, const T scalar, int N) { int tidx = threadIdx.x + blockIdx.x*blockDim.x; int stride = blockDim.x * gridDim.x; for(; tidx < N; tidx += stride) { vector[tidx] += scalar; } } ; template<typename T> __global__ void vector_add_constant( T * vector, const T scalar, int N) { int tidx = threadIdx.x + blockIdx.x*blockDim.x; int stride = blockDim.x * gridDim.x; for(; tidx < N; tidx += stride) { vector[tidx] += scalar; } }
[disclaimer: this core was written in a browser and not verified. Use as own risk]
source share