How to multiply case by 37 using only two consecutive x86 control instructions?

Say that% edi contains x, and I want it to end up with 37 * x using only two consecutive reduction instructions, how would I do it?

For example, to get 45x, you would do

leal (%edi, %edi, 8), %edi leal (%edi, %edi, 4), %eax (to be returned) 

I can’t understand for life what numbers to put instead of 8 and 4 so that the result (% eax) is 37x

+2
assembly x86 x86-64 multiplication strength-reduction
Sep 29 '17 at 1:44 on
source share
1 answer

In -O3 gcc will issue :

 int mul37(int a) { return a*37; } leal (%rdi,%rdi,8), %eax leal (%rdi,%rax,4), %eax ret 

Using 37 = 9*4 + 1

You are in a good company, but you have not noticed this: recent clang usually uses 2 lea commands instead of imul (for example, for *15 ), but it skips this code and uses:

  imull $37, %edi, %eax ret 

It makes *21 with the same pattern as gcc, like 5*4 + 1 . (clang3.6 and earlier always used imul if there was no alternative with a single shl or lea instruction)

ICC and MSVC also use imul, but they do not seem to like using the 2 lea instructions, so imul is "intentional".

See godbolt link for many multipliers with gcc7.2 compared to clang5.0. It is interesting to try gcc -m32 -mtune=pentium or even pentium3 to find out how many more gcc commands you want to use then. Although P2 / P3 has a 4-cycle delay for imul r, r, i , so it looks crazy. Pentium has 9 imul cycles and no OOOs to hide latency, so it makes sense to try to avoid this.

mtune=silvermont should probably only agree to replace the 32-bit imul with one command, since it has 3-cycle / 1c bandwidth, and decoding is often a bottleneck (according to Agner Fog, http://agner.org/optimize / ). You can even consider imul $64, %edi, %eax (or other permissions 2) instead of mov / shl , because imul-instant is copying and multiplying.




Oddly enough, gcc skips the x45 case and uses imul , and clang uses 2 lea s. Guess the time to write some error reports with an error. If 2 lises are better than 1 imul, they should be used wherever possible.

+6
Sep 29 '17 at 3:08 on
source share



All Articles