Code generation for multiple SIMD architectures

I wrote a library where I use CMake to check for headers for MMX, SSE, SSE2, SSE4, AVX, AVX2 and AVX-512. In addition to this, I check for instructions and, if any, I add the necessary compiler flags, -msse2 -mavx -mfma, etc.

All this is very good, but I would like to deploy a single binary file that works on different generations of processors.

Question: Is it possible to tell the compiler (GCC) that whenever it optimizes a function using SIMD, it must generate code for a list of architectures? And, of course, introduce high-level branches.

I think it looks like the compiler generates code for functions where input pointers are either 4 or 8 byte aligned. To prevent this, I use a macro __builtin_assume_aligned.

What is the best practice? Multiple binary files? Naming?

+6
source share
2 answers

Until you care about portability, yes.

Recent versions of GCC make this easier than any other compiler I know of using the target_clones function attribute . Just add an attribute with a list of goals for which you want to create versions, and GCC will automatically create various options, as well as a submit function, to automatically select the version at runtime.

, target, clang icc , ( ) ( ).

AFAIK, , MSVC, .

+6

, SSE/AVX .., " " (.. , -), , AVX, AVX2 AVX512, , , , SSE.

AVX, , , ( ). , , , 10-20% , 15% , , , - .

. .

, , intrinsics, , AVX .., , , MSVC SSE2 ( x64), , , AVX.

MSVC ( , ), GCC 4.9, , . [UPDATE: @nemequ , gcc, , ]. GCC , , , .

, AVX-SSE ( VZEROUPPER, AVX SSE-) - , , CPU .

+3

All Articles