Even if you do not want to write 'asm code', it is single-line with a built-in GCC assembly, and it is worth considering whether you want to force the use of an instruction. For some double value (u) :
double sqrt_val; __asm__ ("sqrtsd %1, %0" : "=x" (sqrt_val) : "x" (u));
A memory source operand is also a legal alternative:
__asm__ ("sqrtsd %1, %0" : "=x" (sqrt_val) : "xm" (u));
This is great for GCC, which (as a rule) will use case when it will be more efficient for this, but otherwise may load the value from memory. This is not so good for clang, which (always!) Spills a register into memory when it is given an alternative restriction of "m" . I would just go with the first form.
If you actually find packed square roots in type __m128d (u) :
__m128d sqrt_val; __asm__ ("sqrtpd %1, %0" : "=x" (sqrt_val) : "x" (u));
source share