Memory Access Comparison

Question

Memory Access Comparison

Which of the two is faster (C ++)?

for(i=0; i<n; i++) { sum_a = sum_a + a[i]; sum_b = sum_b + b[i]; }

or

 for(i=0; i<n; i++) { sum_a = sum_a + a[i]; } for(i=0; i<n; i++) { sum_b = sum_b + b[i]; }

I am new, so I don’t know if this makes sense, but in the first version the array “a” and then “b” are available, which can lead to many memory switches, since the arrays “a” and “b” are in different cells memory. But in the second version, the entire array "a" is accessed first, and then the entire array "b", which means access to permanent memory cells instead of alternating between two arrays.

Does it really matter between the runtime of two versions (even very small)?

+5

c ++ arrays memory

Utkarsh Jul 12 '16 at 11:42

source share

2 answers

Maciekgrynda · Answer 1 · 2016-07-12T11:51:27+0000

I do not think there is a correct answer to this question. In general, the second version has twice as many iterations (processor overhead), but worse memory access (memory access overhead). Now imagine that you are running this code on a PC with a slow clock, but an insanely good cache. The memory overhead is reduced, but since the clock is slow, the same cycle twice makes the execution much longer. Another way: fast hours, but poor memory - starting two cycles is not a problem, so it is better to optimize memory access.

Here is a great example of how you can profile your application: Link

user2079303 · Answer 2 · 2016-07-12T12:06:47+0000

Which of the two is faster (C ++)?

Or. It depends on the

Implementation of operator+ and operator[] (if overloaded)
Array location in memory (near or not)
Array Size
Cpu cache size
Cache associativity
Caching speed versus memory speed
Perhaps other factors

As Revolver_Ocelot mentions in his commentary comment , some compilers can even convert a written cycle to another form.

Does it really matter between the runtime of two versions (even very small)?

It can make a difference. The difference may be significant or insignificant.

Your analysis sounds. Memory access is usually much slower than the cache, and switching between two memory locations can lead to cache interception ^† in some situations. I would recommend using a split default approach and only combine loops if you measured it faster on your target CPU.

^† As MSalters points out, modern desktop processors (as modern as in ~ x86) should not be a problem.

Memory Access Comparison

More articles: