My environment
- Windows 7 x64
- Matlab 2012a x64
- Cuda SDK 4.2
- Tesla C2050 GPU
I am having trouble figuring out why my GPU crashes "with ECC fix error." This error occurs only when using 512 threads or more. I cannot publish the kernel, but I will try to describe what it does.
In general, the kernel takes a number of parameters and creates 2 complex matrices determined by the size of the stream, M and another number N. Thus, the returned matrices will have the size MxN. A typical configuration is 512x512, but each number is independent and can vary up or down. The kernel works when the numbers are 256x256.
Each thread (core) extracts a vector of size 999 from a 2D array based on the identifier of the stream, that is, size 999xM, then it cycles through the row (0 .. N-1) of the output matrices for calculation. A number of intermediate parameters are calculated using only pow, sin, and cos among the + - * / operators. In order to calculate one of the output matrices, an additional cycle is necessary to summarize the contribution of vector 999, which was extracted earlier. This loop performs some intermediate calculations to determine the range of values that will contribute. Then the contribution is scaled by a coefficient determined by the cos and sine values of the calculated fractional value. Here it is crashing. If I stick with a constant value of either 1.0 or whatever, then the kernel runs without problems. however, when only one of the calls (cos or sine) is enabled, the kernel fails.
Below is some pseudo-code:
kernel() { for (int i = 0; i < 999; i++) { ..... } for (int j = 0; j < N; j++) { for (int k = 0; k < 999; k++) { } } }
I thought that this could be due to the registration limit, but the employment calculator indicates that it is not, I use less than 32,768 registers with 512 threads. Can anyone give any suggestions as to what might be causing this?
Here is the ptasx info:
ptxas info : Compiling entry function '_Z40KerneliidddddPKdS0_S0_S0_iiiiiiiiiPdS1_S1_S1_S1_S1_S1_S1_S1_S1_' for 'sm_20' ptxas info : Function properties for _Z40KerneliidddddPKdS0_S0_S0_iiiiiiiiiPdS1_S1_S1_S1_S1_S1_S1_S1_S1_ 8056 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for __internal_trig_reduction_slowpathd 40 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 53 registers, 232 bytes cmem[0], 144 bytes cmem[2], 28 bytes cmem[16] tmpxft_00001d70_00000000-3_MexFunciton.cudafe1.cpp