Can a variable shift generate a partial register (or register μops recombination) on ecx ? If so, on which microarchitecture (s)?
I tested this on Core2 (65nm), which seems to be read only by cl .
_shiftbench: push rbx mov edx, -10000000 mov ecx, 5 _shiftloop: mov bl, 5 ; replace by cl to see possible recombining shl eax, cl add edx, 1 jnz _shiftloop pop rbx ret
Replacing mov bl, 5 with mov cl, 5 did not matter what would happen if the registers were recombined, which can be demonstrated by replacing shl eax, cl with add eax, ecx (in my tests, the version with add experienced a 2.8x slowdown when writing to cl instead of bl ).
Test results:
- Measure: no stall detected
- Penryn: No Stall
- Nehalem: no stall
Update: The new shrx group of shifts in Haswell really shows that stall. The shift-count argument is not written as an 8-bit register, so it might have been expected, but the textual representation really says nothing about such micro-architectural details.
harold
source share