Boojum is correct - IF your compiler has a good optimizer, and you turned it on. If this is not the case, or your use of arrays is not consistent and cannot be optimized, using array offsets can be much slower.
Here is an example. Back in 1988, we implemented a window with a simple teletype interface on the Mac II. This consisted of 24 lines of 80 characters each. When you received a new line from the ticker, you scrolled the top 23 lines and displayed a new one at the bottom. When there was something in the teletype that wasn’t all the time, it appeared at 300 baud, which with an overhead of the serial protocol was about 30 characters per second. Therefore, we are not talking about what should be taxed 16 MHz 68020 at all!
But the one who wrote this did it like this:
char screen[24][80];
and used 2-dimensional array offsets to scroll characters as follows:
int i, j; for (i = 0; i < 23; i++) for (j = 0; j < 80; j++) screen[i][j] = screen[i+1][j];
Six of these windows brought the car to its knees!
Why? Since compilers were stupid in those days, therefore, in a machine language, each instance of the loop's internal purpose screen[i][j] = screen[i+1][j] looked something like this (Ax and Dx are processor registers);
Fetch the base address of screen from memory into the A1 register Fetch i from stack memory into the D1 register Multiply D1 by a constant 80 Fetch j from stack memory and add it to D1 Add D1 to A1 Fetch the base address of screen from memory into the A2 register Fetch i from stack memory into the D1 register Add 1 to D1 Multiply D1 by a constant 80 Fetch j from stack memory and add it to D1 Add D1 to A2 Fetch the value from the memory address pointed to by A2 into D1 Store the value in D1 into the memory address pointed to by A1
So, we are talking about 13 machine machine instructions for each of the iterations of the inner loop 23x80 = 1840, a total of 23920 instructions, including 3680 CPUs with intensive multiplication.
We made a few changes to the C source code, so it looked like this:
int i, j; register char *a, *b; for (i = 0; i < 22; i++) { a = screen[i]; b = screen[i+1]; for (j = 0; j < 80; j++) *a++ = *b++; }
There are two more machine language multiplications, but they are in the outer loop, so instead of 3680 there are only 46 integer multiplications. And the inner loop operator *a++ = *b++ consisted of only two machine operations.
Fetch the value from the memory address pointed to by A2 into D1, and post-increment A2 Store the value in D1 into the memory address pointed to by A1, and post-increment A1.
Given that there is an iteration of the inner cycle of 1840, the total amount of 3680 CPU-cheap instructions is 6.5 times smaller - and the NO integer is multiplied. After that, instead of dying in six teletype windows, we couldn’t pull ourselves up to intimidate the car — first we ended up with teletype data sources. And there are ways to optimize it much, much further.
Now, modern compilers will do the optimization for you — IF , which you ask them to do, and IF — your code is structured to allow this.
But there are still situations where compilers cannot do this for you - for example, if you perform unclassified operations in an array.
So, I found that this helped me use pointers instead of array references whenever possible. Performance, of course, is never worse and often much, much better.