Using AVX commands disables exp () optimization?

I am writing a direct line in VC ++ using the built-in AVX tools. I call this code through PInvoke in C #. My performance when calling a function that calculates a large loop, including the exp () function, is ~ 1000 ms for a 160M loop. As soon as I call any function that uses embedded AVX and then uses exp (), my performance drops to about ~ 8000 ms for the same operation. Note that the function evaluating exp () is standard C, and a call that uses the built-in AVX tools may not be completely related to the data being processed. Some flag is reset somewhere at runtime.

In other words,

A(); // 1000ms calculates 160M exp() 
B(); // completely unrelated but contains AVX
A(); // 8000ms

or, curiously,

C(); // contains 128 bit SSE SIMD expressions
A(); // 1000ms

I get lost as to what is going on here, or how to chase the salt. I am on Intel 2500K cpu \ Win 7. Express version of VS.

Thank.

+5
source share
1 answer

If you use the AVX256 instruction, the “upper state of the AVX” becomes “dirty”, which leads to a lot of stalling if you subsequently use SSE instructions (including a scalar floating point executed in xmm registers). This is described in the Intel Optimization Guide, which you can download for free (and is required to read if you are doing this):

AVX YMM, SSE . YMM :

• : YMM . , RESET.

• XSAVE. YMM XSAVE. , XSAVE/XRSTOR.

• : AVX ( 256-, 128-) YMM .

AVX/SSE , " ". VZEROUPPER, "" .

B( ) YMM, SSE A( ) . VZEROUPPER B A, .

+9

All Articles