Make sure the compiler always uses the sqrt SSE statement

I am trying to get GCC (or clang) to use the SSE statement for sqrt sequentially instead of the math library function for a computationally intensive scientific application. I tried various GCC on various 32 and 64 bit OS X and Linux systems. I am sure to enable sse with -mfpmath = sse (and -march = core2 to satisfy the GCCs requirement to use -mfpmath = sse for 32 bits). I also use -O3. Depending on the version of GCC or clang, the generated assembly does not use SSE sqrtss. In some versions of GCC, all sqrts use a statement. In other cases, there is a mixed use of sqrtss and a math library function call. Is there a way to give a hint or force the compiler to use only the SSE instruction?

+4
source share
2 answers

Use sqrtss built-in __builtin_ia32_sqrtss ?

+4
source

You should be careful in using this, you probably know that it has less accuracy. This will cause gcc not to use it systematically.

There is a trick even mentioned in the INTEL SSE manual (I hope I remember correctly). The result of sqrtss is only one Jeron of Heron from the target. It is possible that gcc can sometimes inline support a short iteration at some point (version), but not for others.

You can use the embedded content, as MSN says, but you must finally find the specifications on the INTEL website to find out what you are trading.

0
source

Source: https://habr.com/ru/post/1315845/


All Articles