Why does VC ++ 2010 often use ebx as a "zero case"?

Yesterday, I looked at the 32-bit code generated by VC ++ 2010 (most likely, I don’t know about specific options, sorry), and I was intrigued by a curious repeating detail: in many functions, it reset the ebx in the prolog, and he always used it as "null register" (think $zero about MIPS). In particular, this is often:
  • used it to zero memory; this is not unusual, since the encoding for mov mem,imm is 1 to 4 bytes larger than mov mem,reg (the full size of the immediate value should be encoded even for 0), but usually (gcc) the desired register is reset to zero " upon request, "and otherwise it will be used for more useful purposes;
  • used to compare with zero - as in cmp reg,ebx . This is what struck me so much as it was unusual, as it should be exactly the same as test reg,reg , but it adds an extra case dependency. Now keep in mind that this happened in non-leaf functions, with ebx often being pushed (called) onto the stack and off the stack, so I will not trust this dependency, to be always completely free. In addition, he also used test reg,reg exactly the same way ( test / cmp => jg ).

Most importantly, the registers on the "classic" x86 are a scarce resource, if you start waking up registers, you spend a lot of time for no good reason; why spend one on the whole function just to keep it zero? (still, thinking about this, I don’t remember how many times I spilled the register on functions that used this "zero register" pattern).

So: what am I missing? Is this a blooper compiler or some kind of incredibly clever optimization that was especially interesting in 2010?

Here's an excerpt:

  ; standard prologue: ebp/esp, SEH, overflow protection, ... then: xor ebx, ebx mov [ebp+4], ebx ; zero out some locals mov [ebp], ebx call function_1 xor ecx, ecx ; ebx _not_ used to zero registers cmp eax, ebx ; ... but used for compares?! why not test eax,eax? setnz cl ; what? it goes through cl to check if eax is not zero? cmp ecx, ebx ; still, why not test ecx,ecx? jnz function_body push 123456 call throw_something function_body: mov edx, [eax] mov ecx, eax ; it not like it was interested in ecx anyway... mov eax, [edx+0Ch] call eax ; virtual method call; ebx is preserved but possibly pushed/popped lea esi, [eax+10h] mov [ebp+0Ch], esi mov eax, [ebp+10h] mov ecx, [eax-0Ch] xor edi, edi ; ugain, registers are zeroed as usual mov byte ptr [ebp+4], 1 mov [ebp+8], ecx cmp ecx, ebx ; why not test ecx,ecx? jg somewhere label1: lea eax, [esi-10h] mov byte ptr [ebp+4], bl ; ok, uses bl to write a zero to memory lea ecx, [eax+0Ch] or edx, 0FFFFFFFFh lock xadd [ecx], edx dec edx test edx, edx ; now it using the regular test reg,reg! jg somewhere_else 

Please note: an earlier version of this question said that he used mov reg,ebx instead of xor ebx,ebx ; it was just me, dont remember things right. Sorry if someone thought too much while trying to figure it out.

+7
assembly x86 visual-c ++ visual-c ++ - 2010
source share
1 answer

Everything that you commented as odd does not seem optimal for me. test eax,eax sets all flags (except AF) in the same way as cmp against zero , and is preferred for performance and code size.

On P6 (PPro via Nehalem) reading long dead registers is bad because it can lead to reading log entries . The P6 core can only read 2 or 3 unmodified architectural registers from a constant register file per cycle (to receive operands for the release stage: ROB contains operands for uops, unlike the SnB family, where it contains only links to a physical register file).

Since it is from VS2010, Sandybridge has not yet been released, so it should have greatly influenced the tuning for Pentium II / III, Pentium-M, Core2 and Nehalem, where reading cold registers is a possible bottleneck.

IDK, if something like this makes sense for integer regs, but I know little about optimization for processors older than P6.


The line cmp / setz / cmp / jnz looks especially braindead . Maybe this comes from an internal sequence based on compilers to create a boolean from something, and it was not possible to optimize the check of Boolean vertices directly on flags? This still does not explain the use of ebx as a zero case, which is also completely useless there.

Is it possible that some of them were from inline-asm that returned a boolean integer (using a stupid one that wanted to get a zero in the register)?

Or maybe the source code compared two unknown values, and only after inline and constant distribution did it turn into a comparison against zero? Which MSVC could not be fully optimized, so it still kept 0 as a constant in the register instead of using test ?


(the rest of this was written before the question included the code).

Sounds weird, or like a case of CSE / constant rise. that is, treat 0 like any other constant that you might want to load once, and then reg-reg copy the entire function.

The analysis of the behavior of data dependencies is correct: the transition from the register, which was reset some time ago, essentially begins a new chain of dependencies.


When gcc wants two null registers, it is often xor-zeroes one, and then uses mov or movdqa to copy to another.

This is not optimal for Sandybridge where xor-zeroing does not need an execution port , but a possible gain is in the Bulldozer family, where mov can be run on AGU or ALU, but xor-zeroing still needs the ALU port.

For vector movements, this is a clear gain on Bulldozer: it is processed by renaming the register without a unit of execution. But for xor-nulling the XMM or YMM register, you still need the Bulldozer family execution port ( or two for ymm, so always use xmm with an implicit null extension ).

However, I do not think that this justifies register binding for the whole function, especially if it does not require additional costs / recovery. And not for P6-family processors where register racks are a thing.

+5
source share

All Articles