ManagedCUDA is ideal for this type of thing. First you need to follow the instructions in the documentation to set up your Visual Studio project.
Here is an example solution:
test.cu (compiled in test.ptx)
#if !defined(__CUDACC__) #define __CUDACC__ #include <host_config.h> #include <device_launch_parameters.h> #include <device_functions.h> #include <math_functions.h> #endif extern "C" { __global__ void test(float * data) { float a = data[0]; float b = data[1]; float c = data[2]; data[0] = max(a, max(b, c)); } }
and here is the C # code:
private static void Test() { using (CudaContext ctx = new CudaContext()) { CudaDeviceVariable<float> d = new CudaDeviceVariable<float>(3); CUmodule module = ctx.LoadModulePTX("test.ptx"); CudaKernel kernel = new CudaKernel("test", module, ctx) { GridDimensions = new dim3(1, 1), BlockDimensions = new dim3(1, 1) }; kernel.Run(d.DevicePointer); } }
This is just a proof of concept, the deviceโs memory is not even initialized, and the result is not readable, but enough to illustrate how to do this.
You have several options for distributing your application. In this case, I decided to collect the .cu file in PTX and load it inside the C # project from the file system.
You can also embed PTX as a resource directly in your C # application.
You can also compile the cube and load or paste it instead of PTX.
RobiK source share