Are there any built-in cross-point products in CUDA, like in opencl, so can cuda kernels use it? I still haven't found anything in the spec.
You can find definitions for these functions in cutil_math.h in the SDK.
There are programs for a point product in CuBLAS .