I am dealing with the problem of accurately computing the modified zero-order Bessel function I0 in CUDA.
For a long time I used the rational Chebyshev approximation according to the document
JM Blair, "Rational Chebyshev approximations for the modified Bessel functions I_0 (x) and I_1 (x)", Math. Comput., Vol. 28, n. 126, pp. 581-583, April 1974
which, compared to the result presented by Matlab, gives an average error of the order of 1e-29. Unfortunately, this seemingly high precision is no longer enough for the new application I'm working on.
Matlab uses Fortran routines developed by D.E. Amos
Amos, D.E., "A Package of Subprograms for Bessel Functions of a Complex Argument and Non-Negative Order", "Report on the Sandia National Laboratory", SAND85-1018, May 1985
Amos, D.E., "A Portable Package for Bessel Functions of a Complex Argument and Non-Negative Order", vol. Mathematics Software, 1986.
which are available for download from the netlib / amos website.
There are ways to use these Fortran routines in C / C ++ code by compiling them into a library file and then using a C / C ++ wrapper (see, for example, netlib_wrapping ). I am wondering if there are any means to force the device functions from these Fortran programs to be called by the CUDA kernel).
MORE DETAILS ON THE PROBLEM
I have two codes: one is written in Matlab and one in CUDA. Both modes work in three stages:
1) Scaling by the modified Bessel function I0 and zero data filling;
2) FFT;
3) Interpolation.
I compare both with the โexactโ result: as a result of step 3), Matlab gives a relative standard error of 1e-10%, and CUDA gives 1 -2%, so I started to investigate the reason.
The root mean square difference between the first step of the two codes, namely 100*sqrt(sum(abs(U_Matlab_step_1-U_CUDA_step_1).^2))/sqrt(sum(abs(U_Matlab_step_1).^2)) , is 0% ( mean(mean(abs(U_Matlab-U_CUDA)))=6e-29 ), so I would say that this is good. Unfortunately, when I go to step 2, the error rises to 2e-4% . Finally, if I file step 2) CUDA with the output of step 1) Matlab, then the standard error in step 2) will become 1e-14% , which makes me think that the source of inaccuracy is due to the first step, namely, the calculation of the modified Bessel function.
FOR INTERESTING DEVELOPMENTS OF THIS DISCUSSION
Check out the NVIDIA Developer Zone