Strange runtime difference between two code segments

So, I wanted to see how the performance of the program can be improved without copying the value of the variable to another variable before comparing it (this will be better explained in the examples), and I noticed something strange. I had these two code segments:

string a = ""; for (int i = 0; i < 1000000; i++) a += 'a'; for (int i = 0; i < 1000000; i++) { if ('b' == a.at(i));//compare the two chars directly } 

and

 string a = ""; for (int i = 0; i < 100000000; i++) a += 'a'; for (int i = 0; i < 100000000; i++) { char c = a.at(i);//declare a new variable if ('b' == c);//compare the char with the newly created variable, //instead of comparing it to the other char directly } 

I thought that the second segment would take longer since another variable is declared there compared to the first segment. When I actually timed the two, I found that the second took less time than the first. I timed it several times, and the second one seems to take about 0.13 seconds less time to complete. Here is the complete code:

 #include <string> #include <iostream> #include <ctime> using namespace std; int main() { clock_t timer; string a = ""; string b; for (int i = 0; i < 100000000; i++) a += "a"; timer = clock(); for (int i = 0; i < 100000000; i++) { if ('b'==a.at(i)) b += "a"; } cout << (clock()-timer)/(float)CLOCKS_PER_SEC << "sec" << endl; timer = clock(); for (int i = 0; i < 100000000; i++) { char c = a.at(i); if ('b'==c) b += "a"; } cout << (clock()-timer)/(float)CLOCKS_PER_SEC << "sec" << endl; return 0; } 

Why is this happening?

EDIT: I followed the NathanOliver suggestion, and I added separate lines for each loop, so now the code looks like this:

 #include <string> #include <iostream> #include <ctime> using namespace std; int main() { clock_t timer; string compare_string_1 = ""; string compare_string_2 = ""; string segment_1 = ""; string segment_2 = ""; for (int i = 0; i < 100000000; i++) compare_string_1 += "a"; for (int i = 0; i < 100000000; i++) compare_string_2 += "a"; timer = clock(); for (int i = 0; i < 100000000; i++) { if ('b'==compare_string_1.at(i)) segment_1 += "a"; } cout << (clock()-timer)/(float)CLOCKS_PER_SEC << "sec" << endl; timer = clock(); for (int i = 0; i < 100000000; i++) { char c = compare_string_2.at(i); if ('b'==c) segment_2 += "a"; } cout << (clock()-timer)/(float)CLOCKS_PER_SEC << "sec" << endl; return 0; } 
+7
c ++ if-statement execution-time
source share
1 answer

Using Visual C ++ 2010, I got the same synchronization results as in the comments above - on average, the second cycle takes about 80% of the execution time of the first. Once or twice, the first cycle was a little faster, but this could be due to some hiccup of the thread in the OS. A disassembly check led to the following:

First cycle:

 01231120 cmp dword ptr [ebp-38h],esi 01231123 jbe main+1CBh (123120Bh) 01231129 cmp dword ptr [ebp-34h],10h 0123112D mov eax,dword ptr [ebp-48h] 01231130 jae main+0F5h (1231135h) 01231132 lea eax,[ebp-48h] 01231135 cmp byte ptr [eax+esi],62h 01231139 jne main+108h (1231148h) 0123113B mov ebx,1 01231140 lea eax,[ebp-80h] 01231143 call std::basic_string<char,std::char_traits<char>,std::allocator<char> >::append (1231250h) 01231148 inc esi 01231149 cmp esi,5F5E100h 0123114F jl main+0E0h (1231120h) 

Second cycle:

 01231155 cmp dword ptr [ebp-1Ch],esi 01231158 jbe main+1CBh (123120Bh) 0123115E cmp dword ptr [ebp-18h],10h 01231162 mov eax,dword ptr [ebp-2Ch] 01231165 jae main+12Ah (123116Ah) 01231167 lea eax,[ebp-2Ch] 0123116A cmp byte ptr [eax+esi],62h 0123116E jne main+13Dh (123117Dh) 01231170 mov ebx,1 01231175 lea eax,[ebp-64h] 01231178 call std::basic_string<char,std::char_traits<char>,std::allocator<char> >::append (1231250h) 0123117D inc esi 0123117E cmp esi,5F5E100h 01231184 jl main+115h (1231155h) 

Since the generated assembly looks more or less the same, I thought about throttling mechanisms in the OS or CPU and guessed what? Addition of Sleep (5000); between the two cycles led to the fact that the second cycle was (almost) always slower than the first. By running it 20 times, the second cycle takes on average about 150% of the first run time.

EDIT: Increasing the spincount value by a factor of five gives the same results. I assume that the operating time of about 0.5 s is more or less reliably measurable. :-)

In the source code, I think the OS may need several time series to detect CPU utilization, and then it starts giving the thread a higher priority during scheduling, and the processor may increase after that, leaving parts of the first cycle β€œuninstalled”. "When the second cycle starts execution, OS / CPU can be primed for a large workload and execute a bit faster. The same can happen with processing pages of MMU or OS internal memory. When adding Sleep between cycles it can happen the opposite, causing the OS to postpone the thread for some time until a new workload is detected, which will make the second cycle slower.

What are your results? Does anyone have a suitable profiler, for example, Intel Amplifier, for measuring CPI rates and processor speed in cycles?

+2
source share

All Articles