Here are my results using DevStudio 2005:
Debug:
- Without block: 25.109
- With block: 19.703
Release:
- Without block: 0
- With block: 6.046
It is very important to run this from the command line, and not from DevStudio, DevStudio something affects the performance of the application.
The only way to find out what is actually happening is to look at the assembler code. Here the assembler is generated in the release: -
FindWithoutBlock: 00401000 xor eax,eax 00401002 cmp dword ptr [ecx+eax*4],0F4240h 00401009 je FindWithoutBlock+1Ah (40101Ah) 0040100B add eax,1 0040100E cmp eax,186A0h 00401013 jl FindWithoutBlock+2 (401002h) 00401015 mov eax,186A0h 0040101A ret
Note that the compiler removed the ArrLen parameter and replaced it with a constant! He also saved it as a function.
Here is what the compiler did with another function (FindWithBlock): -
004010E0 mov dword ptr [esp+38h],186A0h 004010E8 mov ebx,0F4240h 004010ED mov dword ptr [esi+61A80h],ebx 004010F3 xor eax,eax 004010F5 cmp dword ptr [esi],ebx 004010F7 je main+0EFh (40110Fh) 004010F9 lea esp,[esp] 00401100 add eax,1 00401103 cmp dword ptr [esi+eax*4],ebx 00401106 jne main+0E0h (401100h) 00401108 cmp eax,186A0h 0040110D je main+0F5h (401115h) 0040110F call dword ptr [__imp__getchar (4020D0h)] 00401115 sub dword ptr [esp+38h],1 0040111A jne main+0CDh (4010EDh)
Here the function has been inserted. lea esp,[esp] - Only 7 bytes to align the next command. The code checks index 0 separately for all other indexes, but the main loop is more definite than the version of FindWithoutBlock.
Hmmm. Here is the code that calls FindWithoutBlock: -
0040106F mov ecx,edi 00401071 mov ebx,eax 00401073 call FindWithoutBlock (401000h) 00401078 mov ebp,eax 0040107A mov edi,186A0h 0040107F cmp ebp,186A0h 00401085 je main+6Dh (40108Dh) 00401087 call dword ptr [__imp__getchar (4020D0h)] 0040108D sub edi,1 00401090 jne main+5Fh (40107Fh)
Yeah! The FindWitoutBlock function is called only once! The compiler noticed that the function will return the same value each time and optimized it for one call. In FindWithBlock, the compiler cannot make the same assumption because you are writing an array before the search, so the array is (potentially) different for each call.
To verify this, add the volatile keyword as follows: -
int FindWithoutBlock(volatile int * Arr, int ArrLen, int Val) { for ( int i = 0; i < ArrLen; i++ ) if ( Arr[i] == Val ) return i; return ArrLen; } int FindWithBlock(volatile int * Arr, int LastCellIndex, int Val) { Arr[LastCellIndex] = Val; int i; for ( i = 0 ; Arr[i] != Val; i++ ); return i; }
At the same time, both versions work at the same time (6.040). Observing that memory access is the main bottleneck, more complex FindWithoutBlock tests do not affect overall speed.