Assembly code / AVX instructions for multiplying complex numbers. (Built-in GCC assembly)

Question

Assembly code / AVX instructions for multiplying complex numbers. (Built-in GCC assembly)

We have a scientific program, and we would like to implement the functions of AVX. The whole program (written in Fortran + C) will be vectorized and at the moment I'm trying to implement complex multiplication of numbers in the built-in GCC assembly.

The assembly code takes 4 complex numbers and performs two complex multiplications at once:

v2complex cmult(v2complex *a, v2complex *b) { v2complex ret; asm ( "vmovupd %2,%%ymm1;" "vmovupd %2, %%ymm2;" "vmovddup %%ymm2, %%ymm2;" "vshufpd $15,%%ymm1,%%ymm1,%%ymm1;" "vmulpd %1, %%ymm2, %%ymm2;" "vmulpd %1, %%ymm1, %%ymm1;" "vshufpd $5,%%ymm1,%%ymm1, %%ymm1;" "vaddsubpd %%ymm1, %%ymm2,%%ymm1;" "vmovupd %%ymm1, %0;" : "=m"(ret) : "m" (*a), "m" (*b) ); return ret; }

where a and b are 256-bit double precision:

 typedef union v2complex { __m256d v; complex c[2]; } v2complex;

The problem is that the code basically gives the correct result, but sometimes it fails.

I am very new to assembly, but I tried to figure it out myself. The C program (optimized by -O3) seems to interact with the ymm registers used in the assembly code. For example, I can print one of the values (for example, a) before doing the multiplication, and the program never gives the wrong results.

My question is how to tell GCC not to interact with ymm. I was not able to put the ymm list in a list of grouped registers.

+7

c assembly gcc avx complex-numbers

Jean nicolas Apr 2 '13 at 14:35

source share

2 answers

I add two comments without directly answering your question:

I highly recommend using the built-in compiler tools instead of direct assembly. Thus, the compiler takes care of register allocation and can improve the work of optimizing your code (built-in methods, reordering instructions, etc.).
Agner Fog has a library of C ++ vector classes optimized vectorized operations, including operations on complex numbers. Even if you cannot use its libraries directly in your C code, its optimized code may be a good starting point; see src/special/complexvec.h in zipped source code .

+3

Norbert P. Apr 3 '13 at 15:40

source share

Stephen canon · Accepted Answer · 2013-04-02T14:44:16+0000

As you may have guessed, the problem is that you did not specify the GCC, which registers you, you go astray. I am surprised if they do not yet support the placement of YMM registers in the clobber list; What version of GCC are you using?

In any case, it is almost certainly sufficient to place the corresponding XMM lists in the clobber list:

 : "=m" (ret) : "m" (*a), "m" (*b) : "%xmm1", "%xmm2");

Some other notes:

You load both inputs twice, which is inefficient. There is no reason for this.
I would use "r" (a), "r" (b) as restrictions and write down my loads as vmovupd (%2), %%ymm1 . There is probably no difference in the generated code, but it looks more idiomatically correct.
Remember to put vzeroupper following AVX code before the SSE code is executed to avoid (large) stalls.

Assembly code / AVX instructions for multiplying complex numbers. (Built-in GCC assembly)

More articles: