C Is this forked hack faster?

I am trying to pin a value between -127 and 127 on a microcontroller on a Cortex-M .

I have two competing functions, one uses conventions, the other uses a branchy hack. I found here .

// Using conditional statements int clamp(int val) { return ((val > 127) ? 127 : (val < -127) ? -127 : val); } // Using branchless hacks int clamp(int val) { val -= -127; val &= (~val) >> 31; val += -127; val -= 127; val &= val >> 31; val += 127; return val; } 

Now I know that in some cases one of these methods can be faster than the other, and vice versa, but in general, is it worth using the technique of windlessness, since for me it does not really matter which I use, they both will work fine in my case ?

A small background on the microcontroller is an ARM-based microcontroller running 90 MIPS, with a three-stage pipeline, fetching, decoding and execution, and it looks like it has some kind of branch predictor, but I could not figure out the details.

+7
source share
3 answers

ARM Code (GCC 4.6.3 s -O3 ):

 clamp1: mvn r3, #126 cmp r0, r3 movlt r0, r3 cmp r0, #127 movge r0, #127 bx lr clamp2: add r0, r0, #127 mvn r3, r0 and r0, r0, r3, asr #31 sub r0, r0, #254 and r0, r0, r0, asr #31 add r0, r0, #127 bx lr 

Thumb code:

 clamp1: mvn r3, #126 cmp r0, r3 it lt movlt r0, r3 cmp r0, #127 it ge movge r0, #127 bx lr clamp2: adds r0, r0, #127 mvns r3, r0 and r0, r0, r3, asr #31 subs r0, r0, #254 and r0, r0, r0, asr #31 adds r0, r0, #127 bx lr 

Both are windless thanks to the ARM conditional design. I bet they are essentially comparable in performance.

+4
source

Something to understand - ARM and x86 architectures are very different when it comes to branch instructions. As a result, the jump clears the pipeline, which can lead to an acceleration of a number of measures to β€œreturn to where you were” in terms of bandwidth.

To quote the PDF that I downloaded the other day (pg14 http://simplemachines.it/doc/arm_inst.pdf )

Conditional execution

  • Most instruction sets allow conditional execution of branches.
  • However, by reusing condition assessment hardware, ARM effectively increases the number of instructions.
  • All instructions contain a condition field that determines whether the CPU will execute them.
  • Incomplete instructions absorb 1 cycle. - It is still necessary to complete the cycle to ensure that the following instructions are sampled and decrypted.
  • This eliminates the need for many branches that stop the pipeline (3 cycles for replenishment).
  • Allows very dense inline code without branches.
  • The time limit for executing several conditional instructions is often less than the overhead of invoking a branch or subroutine that would otherwise be necessary.
+3
source

Not. C language has no speed; This is a concept that was implemented in C implementations. A perfectly optimal compiler translated both of them into the same machine code.

C compilers are more likely to be able to optimize code that matches common styles and is well defined. The second function is not defined.

These additions and subtractions can cause integer overflows. Integer overflows are undefined behavior, so they can cause your program to crash. Optimistically, your equipment may realize packaging or saturation. A little less optimistic, your OS or compiler can implement signals or traps for whole overflows. Detecting whole overflows can affect the perceptual performance of modifying a variable. In the worst case, your program loses its integrity.

The and and β†’ operators have implementation-specific aspects for signed types. They can lead to negative zero, which is an example of a trap representation. Using a trap view is undefined behavior, so your program may lose integrity.

Perhaps your OS or compiler implements parity check for int objects. In this case, try to recalculate the parity bits every time the variable changes and check the parity bits every time the variable is read. If the parity check fails, your program may lose integrity.

Use the first function. At least this is clearly defined. If your program runs slowly, optimizing this code probably will not speed up your program significantly; Use the profiler to find more significant optimizations, use a more optimal OS or compiler, or buy faster hardware.

0
source

All Articles