We have a scientific program, and we would like to implement the functions of AVX. The whole program (written in Fortran + C) will be vectorized and at the moment I'm trying to implement complex multiplication of numbers in the built-in GCC assembly.
The assembly code takes 4 complex numbers and performs two complex multiplications at once:
v2complex cmult(v2complex *a, v2complex *b) { v2complex ret; asm ( "vmovupd %2,%%ymm1;" "vmovupd %2, %%ymm2;" "vmovddup %%ymm2, %%ymm2;" "vshufpd $15,%%ymm1,%%ymm1,%%ymm1;" "vmulpd %1, %%ymm2, %%ymm2;" "vmulpd %1, %%ymm1, %%ymm1;" "vshufpd $5,%%ymm1,%%ymm1, %%ymm1;" "vaddsubpd %%ymm1, %%ymm2,%%ymm1;" "vmovupd %%ymm1, %0;" : "=m"(ret) : "m" (*a), "m" (*b) ); return ret; }
where a and b are 256-bit double precision:
typedef union v2complex { __m256d v; complex c[2]; } v2complex;
The problem is that the code basically gives the correct result, but sometimes it fails.
I am very new to assembly, but I tried to figure it out myself. The C program (optimized by -O3) seems to interact with the ymm registers used in the assembly code. For example, I can print one of the values (for example, a) before doing the multiplication, and the program never gives the wrong results.
My question is how to tell GCC not to interact with ymm. I was not able to put the ymm list in a list of grouped registers.