Automatic vectorization and vector code manually

Question

Automatic vectorization and vector code manually

Is it better, in a sense, to vectorize the code manually using explicit pragmas, or to rely on or use automatic vectology? For optimal performance using automatic vectorization, it would be necessary to monitor the output of the compiler to ensure that the loops are vectorized or changed until they become vectorized.

Using manual coding, you can verify that the required instructions are emitted, but now the code is most likely not ported (either to other architectures or to other compilers).

+4

optimization gcc loops

casualcoder Jan 03 '09 at 18:40

source share

3 answers

I would never rely on automatic vectorization from any compiler. With gcc I would be doubly careful, because the effects of gcc optimization always change from version to version. Almost everyone I know who relies on special optimizations or extensions to gcc needs to deal with breakdowns when releasing a new version of gcc .

You can usually trust pragmas and internal features, but you should carefully follow the comments for the release of new versions of gcc, and you should tell your users which version of gcc is needed to compile your code.

Once or twice, when vectorization really mattered, we added something to the test suite to invoke objdump and make sure that vector instructions are actually used. It would be nice to be able to detect "bad vector code" (as described by Niels) automatically, but we never reached.

+3

Norman ramsey Jan 03 '09 at 20:01

source share

I have not yet seen an automatic vectorizer that does more good than harm.

+1

Crashworks Jan 17 '09 at 10:27

source share

Nils pipenbrinck · Accepted Answer · 2009-01-03T18:52:34+0000

Automatic vectorization never worked for me. For me, it looks like auto-injection only works for very trivial loops at the moment.

I use a pragma / internal approach and look at the assembly. If the compiler generates bad code (for example, flipping SSE registers onto the stack or adding redundant moves), I use the built-in assembler for the whole body of the loop.

Portability is not a problem. Often you start with a C / C ++ loop and optimize it with built-in functions. Just save the old loop and use it as unit-test / fallback for your SIMD implementation. It is also always wise to remove all SIMD code from a project by determining compile time. Debugging an application is a lot easier. The same parameter can be used for cross-compilation.

Automatic vectorization and vector code manually

More articles: