Address calculation instruction - leaq

Question

Address calculation instruction - leaq

I tried to understand how the instruction for calculating addresses works, especially with the leaq command. Then I get embarrassed when I see examples using leaq to do arithmetic. For example, the following c-code,

long m12(long x) { return x*12; }

In assembly

 leaq (%rdi, %rdi, 2), %rax salq $2, $rax

If my understanding is correct, leaq should move any address (% rdi,% rdi, 2), which should be 2 *% rdi +% rdi, evaluate to% rax. I got confused that the value of x is stored in% rdi, which is just a memory address, why does% rdi times by 3, and then shift that memory address by 2, equal x times 12? Does this not mean that when we multiply% rdi by 3, we will go to another memory address that does not matter x?

+1

c assembly x86 memory-address

Zhiyuan Ruan Oct 06 '17 at 1:36 on

source share

3 answers

LEA is designed to calculate the address . He is not looking for a memory address

This should be much more readable in Intel syntax

 m12(long): lea rax, [rdi+rdi*2] sal rax, 2 ret

So, the first line is equivalent to rax = rdi*3 Then the left shift should multiply rax by 4, which leads to rdi*3*4 = rdi*12

+2

Lưu Vĩnh Phúc Oct 06 '17 at 1:45 on

source share

lea is a shift-and-add statement that uses memory operand syntax and machine coding. This explains the name, but this is not the only thing that is good for him. He never accesses memory, so he likes to use & in C.

See, for example, How to multiply a register by 37 using only two consecutive control instructions in x86?

In C, this is similar to uintptr_t foo = &arr[idx] . Pay attention to & to give the result of arr + idx , including scaling for the size of the arr object. In C, this will abuse the syntax and types of the language, but in x86, assembly pointers and integers are the same thing.

The original 8086 designers may have had pointing mathematics as the main use case, but modern compilers see this as another option for doing arithmetic on pointers / integers and how you should think about it too.

(Note that 16-bit addressing modes do not include shifts, just [BP|BX] + [SI|DI] + disp8/disp16 , so LEA is not so useful for maths up to 386. See Content Link (x86 Addressing Modes ) for more detailed information on 32/64-bit addressing modes, although Intel syntax is used for this, for example [rax + rdi*4] )

Or maybe the original designers just wanted to expose the hardware for calculating the addresses for arbitrary use, because they could do this without using a lot of additional transistors. Decoders should already be able to decode addressing modes, and other parts of the CPU should be able to perform address calculations. The main thing is that you can make the result of calculating the address in the register, instead of using it as a memory address, it was easy to implement in hardware.

Note that most modern processors run LEAs on the same ALUs as normal add and shift instructions. They have dedicated AGUs (address generation units), but use them only for real memory operands. The atomic instance is one exception; LEA works earlier than other ALUs. The internal implementation does not matter, but it is a safe bet that decoding operands for the LEA transfers transistors with address decoding modes for any other instruction. Any other way to expose an input and input command with multiple inputs would take a special encoding.

It is just as good for arbitrary arithmetic as it is for pointers, so it's a mistake to think of it as a destination for pointers these days . It is not an “abuse” or a “trick” to use it for non-pointers, because the whole is in assembly language. It has less bandwidth than add , but it is cheap enough to use almost all the time when it saves even one instruction. But it can save up to three instructions:

 lea eax, [rdi + rsi*4 - 8] ; 3 cycle latency on Intel SnB-family ; 2-component LEA is only 1c latency ;;; without LEA: mov eax, esi ; maybe 0 cycle latency, otherwise 1 shl eax, 2 ; 1 cycle latency add eax, edi ; 1 cycle latency sub eax, 8 ; 1 cycle latency

On some AMD processors, even complex LEA is just a two-cylinder latency, but the 4-instruction sequence will be esi 4 latency cycle, ready to complete the final eax . In any case, this saves 3 uops for the interface to decode and release, and they occupy a place in the reordering buffer until retirement.

It has several main advantages, especially in 32/64-bit code, where addressing modes can use any register and can be shifted:

non-destructive: output in a register that is not one of the inputs. Sometimes it’s useful to just copy and add as lea 1(%rdi), %eax or lea (%rdx, %rbp), %ecx .
can perform 3 or 4 operations in one instruction (see above).
Math without changing EFLAGS may be useful after the test before cmovcc . Or maybe in the add loop with porting to processors with closed flags.
x86-64: An independent position code can use the RIP relative LEA to get a pointer to static data. ( lea foo(%rip), %rdi slightly larger and slower than mov $foo, %edi , so prefer the latter in position-dependent code.)

Besides the RIP-relative LEA in x86-64 mode, they all relate equally to the calculation of pointers or the calculation of integer additions / shifts of non-pointer.

See also the x86 tag wiki for build guides / manuals and performance information.

See also. What 2 integer padding operations can be used without zeroing high bits in inputs if only a low part of the result is required? . The 64-bit address size and 32-bit operand size are the most compact, so prefer lea (%rdx, %rbp), %ecx , if possible, instead of lea (%rdx, %rbp), %rcx or lea (%edx, %ebp), %ecx .

lea (%edx, %ebp), %ecx always useless, but the 64-bit address / operand size is obviously necessary to do 64-bit math. (The Fner objconv agner disassembler even warns about useless address-size prefixes on LEAs with 32-bit operand sizes.)

This question is almost a duplicate of the very highly praised What is the purpose of the LEA instruction? , but most answers explain this in terms of calculating the address from the actual pointer data. It's one,

+2

Peter Cordes Oct 06 '17 at 2:25 on

source share

ShadowRanger · Accepted Answer · 2017-10-06 01:45

leaq does not need to work with memory addresses, and it calculates the address, it is not actually read from the result, so until mov or the like tries to use it, it's just an esoteric way to add one number, plus 1, 2, 4 or 8 times a different number (or the same number in this case). As you can see, he often abuses for mathematical purposes. 2*%rdi+%rdi is just 3 * %rdi , so it calculates x * 3 without involving the multiplier unit in the CPU.

Likewise, a left shift for integers doubles the value for each shifted bit (every zero added to the right), due to the way binary numbers work (the same in decimal numbers, adding zeros to the right is multiplied by 10).

Thus, this is an abuse of the leaq instruction to perform multiplication by 3, and then shifting the result to achieve further multiplication by 4, for the final result of multiplying by 12 without actually using the multiplication command (which, apparently, will work more slowly, and for all , whom I know, this may be correct, the second option is a compiler, as a rule, a losing game).

Address calculation instruction - leaq

More articles: