PyCUDA: C / C ++ includes?

Something that was not mentioned anywhere (at least I see) is that the library functions are exposed to the built-in CUDA kernels.

In particular, I do small / stupid matrix multiplications that do not deserve a separate offload on the GPU, but I offload most of the algorithm that includes this multiplication. No one ever liked using their own linalg functions, since someone always did it better.

TL; DR. What libraries can I play during embedded cores in PyCUDA?

+5
source share
1 answer

I do not know anyone, and I always thought that it would be useful to have.

, ( , ), ++ . Templating the functions , , . , -

template < typename Real, unsigned int l, unsigned int m, unsigned int n >
__device__ __host__ 
void matmul(const Real *a,
            const Real *b,
                  Real *c)
{
    for(int i=0; i<l; i++) {
        for(int j=0; j<n; j++) {
            Real dotprod = Real(0);
               for(int k=0; k<m; k++) {
                   dotprod += a[idx2c(i,k,l)] * b[idx2c(k,j,m)];
                }
                c[idx2c(i,j,l)] = dotprod;
           }
     }
}

, (2x2, 3x3, 4x4, 8x8, 9x9), , , , . CUDA , , .

+1

All Articles