P6 architecture - register the renaming aside, are limited user registers registered as a result of more operations spent on the spill / load?

I am studying JIT design regarding the implementation of dynamic VM languages. I have not done much Assembly since 8086/8088 days, just a little here or there, so be beautiful if I'm not one of a kind.

As I understand it, the x86 (IA-32) architecture still has the same basic limited set of registers as it always has, but the internal registry count has grown significantly, but these internal registers are usually unavailable and are used with register renaming to achieve parallel pipelining of the code which otherwise could not be parallelizable. I understand this optimization very well, but I feel that although these optimizations help in overall throughput and for parallel algorithms, a limited set of registers, we still focus on the results with a large volume of register redistribution, so if x86 doubled or increased in four times the registers available to us, in a typical flow of commands there may be significantly fewer push / pop opcodes? Or there are other processor options that also optimize this,what i don't know about? Basically, if I have a code block that has 4 registers for working with integers, but my device has a dozen variables, I have potentially push / pop for every two or so instructions.

Any links to research or, even better, personal experiences?

EDIT: x86_64 has 16 registers, which is a double x86-32, thanks for the fix and the info.

+5
source share
2 answers

, - , x86 , . , x86 JIT, x86 , . . , - ( , ):

lwz eax,[ebp]
lwz ebx,[ebp+4]
add eax,[edx+0]
push eax 
lwz eax,[ebp+8]
add eax,ebx
pop ebx
add eax,ebx

- ( , , , r0..r16):

lw r3, edx
lw r1, ebp
lw r2, ebp+4 ; the constant '4' is usually stored as an immediate operand
add r1,r2
or r4,r1,r1 ;; move r1 to r4
lw r1, ebp+8
add r1,r2
or r2,r4,r4
add r1,r2

, ( , ) , , , , push/pop / esp+(some small number) .

+9

:

(1) x86-64 16

(2) x86, , , L1, , , L1 " "

+4

All Articles