For piecewise linear interpolation, documents say that scipy.interpolate.griddata uses scipy.interpolate.griddata methods, which in turn uses qhull to make an accurate estimate of the Delaunay input points, then performs standard barycentric interpolation, where for each point you must define each point within each hypertetrahedron, then use its barycentric coordinates as the interpolation weights for the value of the node hypertetrahedron.
It may be difficult to redistribute sorting, but you can access the CPU version from scipy.spatial.Delaunay . The remaining two steps are easily parallelized, although I do not know about a freely available implementation.
If your points with known functions are on a regular grid, the method described here is very simple to implement in CUDA, and I worked with its actual implementations, although not publicly available.
Therefore, I am afraid that you will have to do most of the work yourself ...
Jaime source share