SSE2, Visual Studio 2010, and Debug Build

Can the compiler automatically use SSE2 if optimization is disabled?

When optimization is turned off, does the / arch: SSE2 flag mean anything?

I was given the task of squeezing more performance out of our software. Unfortunately, builds of releases are performed using debugging settings, and attempts to argue in case of optimization have not been successful so far.

Compilation for x86 with compiler flags /ZI /Od /arch:SSE2 /FAs . The generated assembly shows that the compiler does not use SSE2 . Is it because optimization is disabled?

There are several situations in the code like this:

 char* begin = &bufferObject; char* end = begin + sizeof(bufferObject); char result; while ( begin != end ) { result ^= *begin++; } 

I would like the compiler to change this operation for me, but it is not; I suspect that optimization should be enabled.

I manually transcoded two solutions: one using the built-in __asm block, and the other using SSE2 intrinsicts defined in <emmintrin.h> . I would rather not rely on this.

Update

In addition to the questions above, I would like to refer to library functions such as memcpy to use the provided vectorized versions when necessary. Looking at the assembly code for memcpy , I see that there is a function called _VEC_memcpy that uses SSE2 for faster copying. The block that decides whether to assign this subroutine or not is as follows:

  ; First, see if we can use a "fast" copy SSE2 routine ; block size greater than min threshold? cmp ecx,080h jb Dword_align ; SSE2 supported? cmp DWORD PTR __sse2_available,0 je Dword_align ; alignments equal? push edi push esi and edi,15 and esi,15 cmp edi,esi pop esi pop edi jne Dword_align ; do fast SSE2 copy, params already set jmp _VEC_memcpy 

I don't think _VEC_memcpy is _VEC_memcpy called ... ever.

If the /arch:SSE2 flag defines this __sse2_available character?

+4
source share
2 answers

Visual Studio 2010 and earlier do not have support for automatic vectorization at all.

The goal of /arch:SSE2 is to allow the compiler to use scalar SSE for floating point operations instead of the x87 FPU.

This way you can get acceleration with /arch:SSE2 , as it allows you to access more x64 registers. But keep in mind that this is not from vectorization.

If you want to vectorize on VS2010, you pretty much have to do it manually using the built-in functions.


Visual Studio 2012 supports auto-vectorization:

http://msdn.microsoft.com/en-us/library/hh872235%28v=vs.110%29.aspx

+9
source

Trying to optimize the code generated using the MSVC debugging options is a kind of insane order, since the compiler does not work effectively to make your code slow, for example, manipulating data on the stack and turning it off (which causes it to hang at boot, kiosk store) and others things like that.

In any case, MSVC does not engineer this block, whether in Release or Debug. You will need to use the built-in tools to force it to fix the correct machine code. This is / O 2 / Ot / Oi / arch: SSE2:

 PUBLIC ?VectorTest@ @ YADPAD0@Z ; VectorTest ; Function compile flags: /Ogtp ; COMDAT ?VectorTest@ @ YADPAD0@Z _TEXT SEGMENT _begin$ = 8 ; size = 4 _result$ = 11 ; size = 1 _end$ = 12 ; size = 4 ?VectorTest@ @ YADPAD0@Z PROC ; VectorTest, COMDAT ; 143 : { push ebp mov ebp, esp ; 144 : char result; ; 145 : while ( begin != end ) { mov ecx, DWORD PTR _begin$[ebp] mov edx, DWORD PTR _end$[ebp] mov al, BYTE PTR _result$[ebp] cmp ecx, edx je SHORT $LN1@VectorTest $LL2@VectorTest : ; 146 : result ^= *begin++; xor al, BYTE PTR [ecx] inc ecx cmp ecx, edx jne SHORT $LL2@VectorTest $LN1@VectorTest : ; 147 : } ; 148 : return result; ; 149 : } pop ebp ret 0 ?VectorTest@ @ YADPAD0@Z ENDP ; VectorTest _TEXT ENDS 

Modern compilers are really lousy for vectorization, so we rely on the use of embedded SSE applications in our application. I doubt that any compiler will vectorize this particular operation, since it is essentially a β€œshorthand” and not a β€œmap”, and I have not yet seen a compiler that performs horizontal (non-orthogonal) vectorization.

+4
source

All Articles