What is the difference between using array offsets and pointer increments?

Given 2 functions, which should be faster if there is any difference whatsoever? Suppose the input is very large.

void iterate1(const char* pIn, int Size) { for ( int offset = 0; offset < Size; ++offset ) { doSomething( pIn[offset] ); } } 

vs

 void iterate2(const char* pIn, int Size) { const char* pEnd = pIn+Size; while(pIn != pEnd) { doSomething( *pIn++ ); } } 

Are there other issues that need to be considered in any approach?

+4
source share
11 answers

Boojum is correct - IF your compiler has a good optimizer, and you turned it on. If this is not the case, or your use of arrays is not consistent and cannot be optimized, using array offsets can be much slower.

Here is an example. Back in 1988, we implemented a window with a simple teletype interface on the Mac II. This consisted of 24 lines of 80 characters each. When you received a new line from the ticker, you scrolled the top 23 lines and displayed a new one at the bottom. When there was something in the teletype that wasn’t all the time, it appeared at 300 baud, which with an overhead of the serial protocol was about 30 characters per second. Therefore, we are not talking about what should be taxed 16 MHz 68020 at all!

But the one who wrote this did it like this:

 char screen[24][80]; 

and used 2-dimensional array offsets to scroll characters as follows:

 int i, j; for (i = 0; i < 23; i++) for (j = 0; j < 80; j++) screen[i][j] = screen[i+1][j]; 

Six of these windows brought the car to its knees!

Why? Since compilers were stupid in those days, therefore, in a machine language, each instance of the loop's internal purpose screen[i][j] = screen[i+1][j] looked something like this (Ax and Dx are processor registers);

 Fetch the base address of screen from memory into the A1 register Fetch i from stack memory into the D1 register Multiply D1 by a constant 80 Fetch j from stack memory and add it to D1 Add D1 to A1 Fetch the base address of screen from memory into the A2 register Fetch i from stack memory into the D1 register Add 1 to D1 Multiply D1 by a constant 80 Fetch j from stack memory and add it to D1 Add D1 to A2 Fetch the value from the memory address pointed to by A2 into D1 Store the value in D1 into the memory address pointed to by A1 

So, we are talking about 13 machine machine instructions for each of the iterations of the inner loop 23x80 = 1840, a total of 23920 instructions, including 3680 CPUs with intensive multiplication.

We made a few changes to the C source code, so it looked like this:

 int i, j; register char *a, *b; for (i = 0; i < 22; i++) { a = screen[i]; b = screen[i+1]; for (j = 0; j < 80; j++) *a++ = *b++; } 

There are two more machine language multiplications, but they are in the outer loop, so instead of 3680 there are only 46 integer multiplications. And the inner loop operator *a++ = *b++ consisted of only two machine operations.

 Fetch the value from the memory address pointed to by A2 into D1, and post-increment A2 Store the value in D1 into the memory address pointed to by A1, and post-increment A1. 

Given that there is an iteration of the inner cycle of 1840, the total amount of 3680 CPU-cheap instructions is 6.5 times smaller - and the NO integer is multiplied. After that, instead of dying in six teletype windows, we couldn’t pull ourselves up to intimidate the car — first we ended up with teletype data sources. And there are ways to optimize it much, much further.

Now, modern compilers will do the optimization for you — IF , which you ask them to do, and IF — your code is structured to allow this.

But there are still situations where compilers cannot do this for you - for example, if you perform unclassified operations in an array.

So, I found that this helped me use pointers instead of array references whenever possible. Performance, of course, is never worse and often much, much better.

+6
source

With a modern compiler, there should be no difference in performance between the two, especially in such simplified, easily recognizable examples. Moreover, even if the compiler does not recognize their equivalence, that is, translates each code “literally”, there should still not be a noticeable difference in performance on a typical modern hardware platform. (Of course, there may be more specialized platforms where the difference may be noticeable.)

As in other considerations ... Conceptually, when you implement an algorithm using access to the index, you set the requirement for random access to the underlying data structure. When you use pointer access (an “iterator”), you only impose a sequential access requirement on the underlying data structure. Random access is a stronger requirement than sequential access. For this reason, for example, in my code, I prefer to stick to pointer access whenever possible, and use index access only when necessary.

More generally, if an algorithm can be efficiently implemented through sequential access, it is best to do it this way without involving an unnecessarily stronger random access requirement. This may be useful in the future if it becomes necessary to reorganize the code or change the algorithm.

+3
source

They are almost identical. Both solutions include a temporary variable, word increment on your system (int or ptr) and a logical check that must accept one assembly instruction.

The only difference I see is array search

arr [IDX]

pointer arithmetic may be required, and then fetch when dereferencing:

* PTR

only fetch required

My advice is that if it really matters, implement both options and see if there are any savings.

+2
source

To be sure you need a profile in your target environment.

However, I assume that any modern compiler will optimize them as to very similar (if not identical) code.

If you don’t have an optimizer, the second one has a chance to be faster, because you do not recount the pointer at each iteration. But if Size is not a VERY large number (or the procedure is called quite often), the difference will not matter for your overall program execution speed.

+2
source

A few years ago I asked this exact question. Someone in the interview could not find a candidate to select the array entries, because he was supposedly obviously slower. At this point, I compiled both versions and looked at the disassembly. There was another opcode in array notation. This was with Visual C ++ (.net?). Based on what I saw, I came to the conclusion that there is no noticeable difference.

Performing this again, here is what I found:

  iterate1(arr, 400); // array notation 011C1027 mov edi,dword ptr [__imp__printf (11C20A0h)] 011C102D add esp,0Ch 011C1030 xor esi,esi 011C1032 movsx ecx,byte ptr [esp+esi+8] <-- Loop starts here 011C1037 push ecx 011C1038 push offset string "%c" (11C20F4h) 011C103D call edi 011C103F inc esi 011C1040 add esp,8 011C1043 cmp esi,190h 011C1049 jl main+32h (11C1032h) iterate2(arr, 400); // pointer offset notation 011C104B lea esi,[esp+8] 011C104F nop 011C1050 movsx edx,byte ptr [esi] <-- Loop starts here 011C1053 push edx 011C1054 push offset string "%c" (11C20F4h) 011C1059 call edi 011C105B inc esi 011C105C lea eax,[esp+1A0h] 011C1063 add esp,8 011C1066 cmp esi,eax 011C1068 jne main+50h (11C1050h) 
+2
source

The op pointer was used much faster. Now it's a little faster, but the compiler can optimize it for you

Historically, iterations through *p++ were much faster than p[i] ; which was part of the motivation for having pointers in the language.

In addition, p[i] often requires slower multiplication or at least a shift, so optimizing the replacement of factors in a loop with the addition of a pointer was important enough to have a specific name: strength reduction. The index also tended to create larger code.

However, two things have changed: one is that compilers are much more complex and, as a rule, are able to perform this optimization for you.

Another is that the relative difference between the statement and memory access has increased. When *p++ , memory was invented and the operating periods were the same. Today, a random desktop machine can perform 3 billion whole operations per second, but only about 10 or 20 million random DRAM data. Access to the cache is faster, and the system will pre-select and transmit sequential memory accesses when passing through the array, but it still costs a lot of time to hit the memory, and a small part of scripting with the index is not such a big problem.

+2
source

Why don't you try both times and times? I assume that they will be optimized by the compiler into the same code. Remember to enable optimization when comparing (-O3).

+1
source

In the Other Considerations column, I would say that one approach is clearer. This is just my opinion.

+1
source

You are asking the wrong question. Should a developer focus on readability or performance first?

The first version is idiomatic for array processing, and your intentions will be clear to everyone who used to work with arrays, while the second largely depends on the equivalence between array names and pointers, forcing someone to read the code to switch metaphors several times.

Outline the comments saying that the second version is crystal clear to any developer worthy of his keybaord.

If you wrote your program and it works slowly, and you have profiled to such an extent that you have identified this cycle as a bottleneck, then it would be advisable to open the hood and see which one is faster. But first, try to clarify and run something using the well-known constructions of idiomatic languages.

+1
source

Performance issues aside, it seems to me that the while loop option has potential maintainability problems, since the programmer coming to add some new bells and whistles must remember to increase the array increment in the right place, while the for loop option safely outputs it from the body of the loop.

+1
source

All Articles