The actual problem with your built-in asm is that you only declare r as output, so the compiler optimizes initialization. You should use the constraint "+r" instead of "=r" , and it should work.
The best optimized version might look like this:
float signf(float x) { float r; __asm__ __volatile__ ( "and %0, 0x80000000;" "or %0, 0x3f800000;" :"=r"(r):"0"(x)); return r; }
Please note that this function includes the conversion float-> int-> float (via memory), which may affect performance.
Version C of the above code:
float signf(float x) { union { float f; int i; } tmp, res; tmp.f = x; res.f = 1; res.i |= tmp.i & 0x80000000; return res.f; }
This generates identical code for me (using gcc 4.4.5).
Simple approach C return x < 0 ? -1 : 1; return x < 0 ? -1 : 1; Generates full FPU code without conversion or memory access (other than loading the operand), so it may work better. It also uses fcmov if available to avoid forking. Need some benchmarking.
source share