PyCUDA / CUDA: Causes of Non-Deterministic Launch Failures?

Anyone who follows CUDA will probably see a few of my inquiries regarding a project I am participating in, but for those who do not, I will summarize. (Sorry for the long question in advance)

Three cores, one Creates a data set based on some input variables (transactions with bit combinations can grow exponentially), the other solves these generated linear systems and another recovery kernel to get the final result. These three kernels start over and over again as part of an optimization algorithm for a particular system.

On my dev machine (Geforce 9800GT running under CUDA 4.0) this works fine, all the time, no matter what I throw on it (up to the computational limit based on the specified exponential nature), but on the test machine (4xTesla S1070, only one is used under CUDA 3.1), the exact code (Python base, PyCUDA interface for CUDA cores) gives accurate results for "small" cases, but in middle-level cases, the decisive stage fails at random iterations.

The previous problems that I came across with this code are related to the numerical instability of the problem and were deterministic in nature (i.e. each time at the same stage), but this frankly dampens me because it will fail when wants to.

Thus, I don’t have a reliable way to crack CUDA code from the Python framework and properly debug, and PyCUDA debugging support is at least dubious.

I checked the usual things, like pre-checking the free memory kernel on the device, and padding calculations say that the distribution of the grid and blocks is fine. I don't do crazy 4.0 specific things, I release everything that I allocate on the device at each iteration, and I fixed all data types as floating.

TL DR . Has anyone encountered any issues related to CUDA 3.1 that I did not see in the release notes, or any issues with the PyCUDA automatic memory management system that could cause intermittent failures to trigger repeated calls?

+4
source share
2 answers

You tried:

cuda-memcheck python yourapp.py 

You probably have access to memory beyond boundaries.

+4
source

You can use the nVidia CUDA Profiler and see what happens before the failure.

-1
source

All Articles