Everything that you commented as odd does not seem optimal for me. test eax,eax sets all flags (except AF) in the same way as cmp against zero , and is preferred for performance and code size.
On P6 (PPro via Nehalem) reading long dead registers is bad because it can lead to reading log entries . The P6 core can only read 2 or 3 unmodified architectural registers from a constant register file per cycle (to receive operands for the release stage: ROB contains operands for uops, unlike the SnB family, where it contains only links to a physical register file).
Since it is from VS2010, Sandybridge has not yet been released, so it should have greatly influenced the tuning for Pentium II / III, Pentium-M, Core2 and Nehalem, where reading cold registers is a possible bottleneck.
IDK, if something like this makes sense for integer regs, but I know little about optimization for processors older than P6.
The line cmp / setz / cmp / jnz looks especially braindead . Maybe this comes from an internal sequence based on compilers to create a boolean from something, and it was not possible to optimize the check of Boolean vertices directly on flags? This still does not explain the use of ebx as a zero case, which is also completely useless there.
Is it possible that some of them were from inline-asm that returned a boolean integer (using a stupid one that wanted to get a zero in the register)?
Or maybe the source code compared two unknown values, and only after inline and constant distribution did it turn into a comparison against zero? Which MSVC could not be fully optimized, so it still kept 0 as a constant in the register instead of using test ?
(the rest of this was written before the question included the code).
Sounds weird, or like a case of CSE / constant rise. that is, treat 0 like any other constant that you might want to load once, and then reg-reg copy the entire function.
The analysis of the behavior of data dependencies is correct: the transition from the register, which was reset some time ago, essentially begins a new chain of dependencies.
When gcc wants two null registers, it is often xor-zeroes one, and then uses mov or movdqa to copy to another.
This is not optimal for Sandybridge where xor-zeroing does not need an execution port , but a possible gain is in the Bulldozer family, where mov can be run on AGU or ALU, but xor-zeroing still needs the ALU port.
For vector movements, this is a clear gain on Bulldozer: it is processed by renaming the register without a unit of execution. But for xor-nulling the XMM or YMM register, you still need the Bulldozer family execution port ( or two for ymm, so always use xmm with an implicit null extension ).
However, I do not think that this justifies register binding for the whole function, especially if it does not require additional costs / recovery. And not for P6-family processors where register racks are a thing.