In simple CUDA programs, we can print messages in streams, including cuPrintf.h, but this is not explained anywhere in PyCUDA. How to do it in PyCUDA?
Compute Capability 2.0 and later cuPrintf.h do not recommend using CUDA's built-in printf (). To use it, simply #include <stdio.h> and call printf() just like on the host.
cuPrintf.h
#include <stdio.h>
printf()
The PyCUDA wiki page has a concrete example of this .