I want to integrate the CPU manager at runtime into my library. I have several versions of some functions optimized for sse2 / sse3 / avx and the universal version of x87. I want to compile all versions into one .so library, and I think how to implement cpu manager.
The fastest way, in my opinion, is to get processor dispatching at the build stage (dynamic linking), so when ld.so loads my library, I want it to check if cpu supports sse2, sse3 or avx, and then I want so that ld.so selects the correct set of functions.
For example (using the gcc target attribute ):
Library:
float* func3_generic(float *a, float *b) __attribute__ ((__target__ ("fpmath=387")));
float* func3_sse2(float *a, float *b) __attribute__ ((__target__ ("sse2")));
float* func3_sse3(float *a, float *b) __attribute__ ((__target__ ("sse3")));
float* func3_avx(float *a, float *b) __attribute__ ((__target__ ("avx")));
I want to have a special character func3(), which will be set by the linker (ld.so) to the most advanced of the func3_generic, func3_sse2, func3_sse3, func3_avx. So, if the processor is Core i7-xxxx, I want every func3 call to be a func3_avx call, and if cpu is a PentiumPro, the func3 call will be called by func3_generic.
At the same time, I donโt want to write a lot of sending code manually, and I want the correct option to be selected with minimal overhead (without additional indirect transition). This means that I can allow myself extra time when starting the application, but there is nothing superfluous in calling this function (in some cases there are a lot of calls).
UPDATE Linker can perform scheduling based on the AUXV vector, field AT_HWCAP::
$ LD_SHOW_AUXV=1 /bin/echo
...
AT_HWCAP: fpu ... mmx fxsr sse sse2