C pointer dereference speed

Question

C pointer dereference speed

I have a question regarding pointer dereferencing speed. I have this structure:

typedef struct _TD_RECT TD_RECT; struct _TD_RECT { double left; double top; double right; double bottom; };

My question is: which one will be faster and why?

CASE 1:

 TD_RECT *pRect; ... for(i = 0; i < m; i++) { if(p[i].x < pRect->left) ... if(p[i].x > pRect->right) ... if(p[i].y < pRect->top) ... if(p[i].y > pRect->bottom) ... }

CASE 2:

 TD_RECT *pRect; double left = pRect->left; double top = pRect->top; double right = pRect->right; double bottom = pRect->bottom; ... for(i = 0; i < m; i++) { if(p[i].x < left) ... if(p[i].x > right) ... if(p[i].y < top) ... if(p[i].y > bottom) ... }

So, in case 1, the loop automatically plays the pRect pointer to get comparison values. In case 2, new values were made in the local function space (on the stack), and the values were copied from pRect to local variables. Through the cycle there will be many comparisons.

In my opinion, they will be equally slow because the local variable is also a reference to the memory on the stack, but I'm not sure ...

In addition, it would be better to keep the link p [] at the index or increase p by one element and dereference it directly without the index.

Any ideas? Thanks:)

+7

c ++ c pointers dereference local

oldSkool Oct 21 '10 at 11:18

source share

5 answers

paxdiablo · Answer 1 · 2010-10-21T11:25:27+0000

You will probably find that this will not affect modern compilers. Most of them are likely to perform the general exclusion of sub-expressions, expressing expressions that do not change in the cycle. It is unreasonable to assume that there is a simple one-to-one mapping between your C statements and assembly code. I saw gcc code for pumping out code that could disgrace my assembler skills.

But this is not a question of C or C ++, since the ISO standard does not give any instructions on its implementation. The best way to verify this is to generate assembler code with something like gcc -S and examine two cases in detail.

You will also get more profit from your investment if you avoid such micro-optimization and focus more on the macro level, such as choosing an algorithm, etc.

And, as with all optimization matters, measure, don't guess! There are too many variables that can affect it, so you should compare different approaches in the target environment with realistic data.

Cashcow · Answer 2 · 2010-10-21T11:29:23+0000

Most likely, this will be an extremely important difference in performance. You can view the profile every time several times and see. Make sure you have the optimization of your compiler installed in the test.

As for storing doubles, you can get some performance with the const function. How big is your array?

As for using pointer arithmetic, it could be faster, yes.

You can instantly optimize if you know the left <right in your rect (of course, this should be). If x <left, it also cannot be right, so you can add "else".

Your big optimization, if any, would arise due to the lack of the need to iterate over all the elements in your array and not perform 4 checks for all of them.

For example, if you indexed or sorted your array by x and y, you could, using a binary search, find all the values that have x <left and loops through only those.

andrewmu · Answer 3 · 2010-10-21T11:23:20+0000

I think the second case is likely to be faster, because you are not playing the pointer to pRect at each iteration of the loop.

In practice, the compiler performing the optimization may notice this, and there may not be any difference in the generated code, but the ability of pRect to be an alias of an element in p [] can prevent this.

codaddict · Answer 4 · 2010-10-21T11:29:01+0000

The optimizing compiler will see that calls to the structure are loop-invariant , as well as Loop-invariant code movement , which makes your two cases look the same.

doron · Answer 5 · 2010-10-21T13:43:26+0000

I would be surprised if even the completely non-optimized compiler (- O0) would produce different code for the two cases presented. To perform any operation on a modern processor, data must be loaded into registers. Therefore, even when you declare automatic variables, these variables will not exist in the main memory, but in one of the processors floating point registers. This will be true even if you do not declare the variables yourself, and therefore I do not expect any difference in the generated machine codes even when declaring temporary variables in your C ++ code.

But, as others have said, compile the code in assembler and see for yourself.

C pointer dereference speed

More articles: