How good is NVCC in code optimization?

How best is NVCC device code optimized? Does he make any optimizations, such as constantly bending and eliminating the general subexpression?

For example, this will reduce the following:

float a = 1 / sqrtf(2 * M_PI);
float b = c / sqrtf(2 * M_PI);

:

float sqrt_2pi = sqrtf(2 * M_PI); // Compile time constant
float a = 1 / sqrt_2pi;
float b = c / sqrt_2pi;

How about smarter optimizations, including knowledge of the semantics of mathematical functions:

float a = 1 / sqrtf(c * M_PI);
float b = c / sqrtf(M_PI);

:

float sqrt_pi = sqrtf(M_PI); // Compile time constant
float a = 1 / (sqrt_pi * sqrtf(c));
float b = c / sqrt_pi;
+5
source share
1 answer

The compiler is far ahead of you. In your example:

float a = 1 / sqrtf(2 * M_PI);
float b = c / sqrtf(2 * M_PI);

nvopencc (Open64) produces the following:

    mov.f32         %f2, 0f40206c99;        // 2.50663
    div.full.f32    %f3, %f1, %f2;
    mov.f32         %f4, 0f3ecc422a;        // 0.398942

which is equivalent

float b = c / 2.50663f;
float a = 0.398942f;

The second case compiles:

float a = 1 / sqrtf(c * 3.14159f); // 0f40490fdb
float b = c / 1.77245f; // 0f3fe2dfc5

I assume that the expression for agenerated by the compiler should be more accurate than your "optmized" version, but at about the same speed.

+8
source

All Articles