GLSL: scalar and vector characteristics

All modern GPUs have a scalar architecture, but shading languages ​​offer many vector and matrix types. I would like to know how GLSL source code affects or scans performance. For example, define some “scalar” points:

float p0x, p0y, p1x, p1y, p2x, p2y, p3x, p3y, p4x, p4y; p0x = 0.0f; p0y = 0.0f; p1x = 0.0f; p1y = 0.61f; p2x = 0.9f; p2y = 0.4f; p3x = 1.0f; p3y = 1.0f; 

and their vector equivalents:

 vec2 p0 = vec2(p0x, p0y); vec2 p1 = vec2(p1x, p1y); vec2 p2 = vec2(p2x, p2y); vec2 p3 = vec2(p3x, p3y); 

With these points, which of the following mathematically equivalent parts of the code will work faster?

Scalar code:

 position.x = -p0x*pow(t-1.0,3.0)+p3x*(t*t*t)+p1x*t*pow(t-1.0,2.0)*3.0-p2x*(t*t)*(t-1.0)*3.0; position.y = -p0y*pow(t-1.0,3.0)+p3y*(t*t*t)+p1y*t*pow(t-1.0,2.0)*3.0-p2y*(t*t)*(t-1.0)*3.0; 

or its vector equivalent:

 position.xy = -p0*pow(t-1.0,3.0)+p3*(t*t*t)+p1*t*pow(t-1.0,2.0)*3.0-p2*(t*t)*(t-1.0)*3.0; 

?

Or will they run on modern GPUs as fast?

The above code is just an example. Real-world examples of such “vectorized” code can perform much more difficult calculations with much more input variables coming from global in s, uniforms, and vertex attributes.

+8
performance vectorization opengl glsl
source share
1 answer

It’s best to do benchmarking on all kinds of systems (for example, GPUs) that you think can be used with this code, and work out those that are faster with vectorized code and which are faster with Scalarized code. Then you should write both versions of the code (or, more likely, many versions), and write the execution logic to switch which version is used based on which GPU / drivers are used.

This, of course, is a huge problem. Most programmers do not; GPGPU programmers usually have only one type of server / GPU node with which they work, so their code will be specifically adapted to only one architecture. Meanwhile, in AAA Game Studios (which is the only other place where there is a budget and workforce to solve these kinds of problems), they usually just let NVidia and AMD understand this magic at its end, where NVidia / AMD will write better. more optimized versions of the shaders used by these games, add them to your drivers and tell the drivers to be replaced in the best shaders instead of Gearbox / Bethesda / whoever tries to download.

The important thing is that for your use case, it’s best to focus on making the code more user friendly; this will save you more time and make your program more efficient than any "premature optimization" (which, let it be clear, is basically what you do).

+2
source share

All Articles