Why does GCC prefer the FP version for AVX?

Question

Why does GCC prefer the FP version for AVX?

When compiling for processors with AVX (like c -march=sandy-bridge), GCC always seems to prefer the AVX versions of simple, scalar floating-point instructions over SSE versions. For example, it uses vmulsdinstead mulsd.

I wonder if there are any specific performance-related reasons for this, or are these just some of the details of the GCC implementation that make it easier / more natural to plan such instructions? From what I can tell from the sources, I have (mostly Agner instruction tables) the AVX and SSE instructions seem to be the same in performance. I understand that AVX instructions are three operands, but GCC seems to almost always use the same destination register as one of the source operands.

+4

assembly gcc x86-64

Dolda2000 Jun 01 '16 at 16:11

source share

No one has answered this question yet.

See similar questions:

41

Using AVX processor instructions: poor performance without "/ arch: AVX"

or similar:

1938

Why doesn't GCC optimize a * a * a * a * a * a to (a * a * a) * (a * a * a)?

1250