Memory Access Comparison

Which of the two is faster (C ++)?

for(i=0; i<n; i++) { sum_a = sum_a + a[i]; sum_b = sum_b + b[i]; } 

or

 for(i=0; i<n; i++) { sum_a = sum_a + a[i]; } for(i=0; i<n; i++) { sum_b = sum_b + b[i]; } 

I am new, so I don’t know if this makes sense, but in the first version the array β€œa” and then β€œb” are available, which can lead to many memory switches, since the arrays β€œa” and β€œb” are in different cells memory. But in the second version, the entire array "a" is accessed first, and then the entire array "b", which means access to permanent memory cells instead of alternating between two arrays.

Does it really matter between the runtime of two versions (even very small)?

+5
source share
2 answers

I do not think there is a correct answer to this question. In general, the second version has twice as many iterations (processor overhead), but worse memory access (memory access overhead). Now imagine that you are running this code on a PC with a slow clock, but an insanely good cache. The memory overhead is reduced, but since the clock is slow, the same cycle twice makes the execution much longer. Another way: fast hours, but poor memory - starting two cycles is not a problem, so it is better to optimize memory access.

Here is a great example of how you can profile your application: Link

+4
source

Which of the two is faster (C ++)?

Or. It depends on the

  • Implementation of operator+ and operator[] (if overloaded)
  • Array location in memory (near or not)
  • Array Size
  • Cpu cache size
  • Cache associativity
  • Caching speed versus memory speed
  • Perhaps other factors

As Revolver_Ocelot mentions in his commentary comment , some compilers can even convert a written cycle to another form.

Does it really matter between the runtime of two versions (even very small)?

It can make a difference. The difference may be significant or insignificant.

Your analysis sounds. Memory access is usually much slower than the cache, and switching between two memory locations can lead to cache interception † in some situations. I would recommend using a split default approach and only combine loops if you measured it faster on your target CPU.

† As MSalters points out, modern desktop processors (as modern as in ~ x86) should not be a problem.

+3
source

All Articles