Why does GCC prefer the FP version for AVX?

When compiling for processors with AVX (like c -march=sandy-bridge), GCC always seems to prefer the AVX versions of simple, scalar floating-point instructions over SSE versions. For example, it uses vmulsdinstead mulsd.

I wonder if there are any specific performance-related reasons for this, or are these just some of the details of the GCC implementation that make it easier / more natural to plan such instructions? From what I can tell from the sources, I have (mostly Agner instruction tables) the AVX and SSE instructions seem to be the same in performance. I understand that AVX instructions are three operands, but GCC seems to almost always use the same destination register as one of the source operands.

+4
source share

All Articles