A modern processor uses a rather complex chain of conditions to guess which instruction will follow a conditional branch. Because the CPU decodes and processes each instruction in parallel with many other instructions, the cost of getting a misunderstanding can be disastrous. Just reordering the tests in the branch, or even code that comes immediately before or after, can lead to a change in the forecast. Unfortunately, there is no easy way to predict what will work better.
For this reason, the only way to make informed decisions about optimizing the processor binding code (as you describe) is to measure what the code really does, make small changes, and measure again to see if there are any improvements.
If you really use a soft real-time application and it uses a 100% processor, this probably does not mean that you should try to scale the usage back, but provide more processor to use it, since the input exceeds the application's ability to maintain speed. In fact, scaling or output is probably cheaper than improving code performance; server hardware is cheap compared to development time.
source share