Can the compiler automatically use SSE2 if optimization is disabled?
When optimization is turned off, does the / arch: SSE2 flag mean anything?
I was given the task of squeezing more performance out of our software. Unfortunately, builds of releases are performed using debugging settings, and attempts to argue in case of optimization have not been successful so far.
Compilation for x86 with compiler flags /ZI /Od /arch:SSE2 /FAs . The generated assembly shows that the compiler does not use SSE2 . Is it because optimization is disabled?
There are several situations in the code like this:
char* begin = &bufferObject; char* end = begin + sizeof(bufferObject); char result; while ( begin != end ) { result ^= *begin++; }
I would like the compiler to change this operation for me, but it is not; I suspect that optimization should be enabled.
I manually transcoded two solutions: one using the built-in __asm block, and the other using SSE2 intrinsicts defined in <emmintrin.h> . I would rather not rely on this.
Update
In addition to the questions above, I would like to refer to library functions such as memcpy to use the provided vectorized versions when necessary. Looking at the assembly code for memcpy , I see that there is a function called _VEC_memcpy that uses SSE2 for faster copying. The block that decides whether to assign this subroutine or not is as follows:
; First, see if we can use a "fast" copy SSE2 routine ; block size greater than min threshold? cmp ecx,080h jb Dword_align ; SSE2 supported? cmp DWORD PTR __sse2_available,0 je Dword_align ; alignments equal? push edi push esi and edi,15 and esi,15 cmp edi,esi pop esi pop edi jne Dword_align ; do fast SSE2 copy, params already set jmp _VEC_memcpy
I don't think _VEC_memcpy is _VEC_memcpy called ... ever.
If the /arch:SSE2 flag defines this __sse2_available character?
source share