Efficiency between pointer and array (less assembly instructions take less time)

Some people said: "Any operation that can be achieved by subscribing to arrays can also be done using pointers. The version of the pointer will generally be faster."

I doubt the results of the above, so I do the following test:

In the next article, we do not need compiler optimization. On compiler optimization, how to influence efficiency between a pointer and an array, pay attention: Efficiency: arrays versus pointers

(Visual Studio 2010, debugging mode, no optimization)

#include <windows.h> #include <stdio.h> int main() { int a[] = {10,20,30}; int* ap = a; long counter; int start_time, end_time; int index; start_time = GetTickCount(); for (counter = 1000000000L; counter>0; counter--) { *(ap+1) = 100; } end_time = GetTickCount(); printf("10 billion times of *ap = %d\n", end_time-start_time); start_time = GetTickCount(); for (counter = 1000000000L; counter>0; counter--) { a[1] = 101; } end_time = GetTickCount(); printf("10 billion times of a[0] = %d\n", end_time-start_time); return 0; } 

result:

 10 billion times of *ap = 3276 10 billion times of a[0] = 3541 

The pointer seems a bit fast. But after I compared the disassembler, I plunged into a deeper confusion.

(Visual Studio 2010, debugging mode, no optimization)

 ; 17 : *(ap+1) = 100; mov eax, DWORD PTR _ap$[ebp] mov DWORD PTR [eax+4], 100 ; 00000064H ; 25 : a[1] = 101; mov DWORD PTR _a$[ebp+4], 101 ; 00000065H 

From the collected output, accessing memory through a pointer takes 2 commands, and an array accepts only 1 command.

Why does an array execute fewer instructions but take less time than a pointer?

Is this related to the processor cache? How can I modify my test code to prove this?

+6
source share
1 answer

First and foremost, the C language has no speed. This is an attribute introduced by the implementation of C. For example, C has no speed, but the GCC compiler generates code that may differ in speed from the code generated by the Clang compiler, and both of them can generate code that performs the behavior produced by the Cint or Ch translators. All this is an implementation of C. Some of them are slower than others, but the speed cannot be attributed to C anyway!

6.3.2.1 of standard C says:

Unless it is an operand of the sizeof operator, the _Alignof operator, either unary and operator, or is a string literal used to initialize an array, an expression that is of type '' array of type is converted to an expression with a pointer of type '' to indicate which points to the original element of the array object and is not an lvalue.

This should be a sign that both *(ap+1) and a[1] in your code are pointer operations. This translation will take place at the compilation stage in Visual Studio. Therefore, this should not affect runtime.

6.5.2.1 with respect to the "array substring" says:

One of the expressions must have a type pointer '' to complete the object type, the other expression must have an integer type, and the result has a type type. This indicates that the array index operator is actually a pointer operator ...

This is a confirmation that ap[1] indeed a pointer operation, as we postulated earlier. However, at run time, the array has already been translated to a pointer. Performance must be identical.

... so why are they not identical?

What are the characteristics of the OS used? Isn't it multitasking, multi-user OS? Suppose that the OS was supposed to complete the first cycle without interruption, but then interrupt the second cycle and switch control to another process. Wouldn't this interruption justify your experiment? How do you measure the frequency and time of interruptions caused by task switching? Please note that this will be different for different OSs, and the OS is part of the implementation.

What are the specifications of the processor you are using? Does it have its own fast internal cache for machine code? Suppose your whole first cycle, and it covers the synchronization mechanism, should fit in the code cache well, but the second cycle has been truncated. Wouldn't that lead to a cache miss and a long wait for your processor to select the rest of the code from RAM? How do you measure interrupt time caused by cache misses? Note that this will be different for different CPUs, and the CPU is part of the implementation.

These questions should raise some questions, such as "Is this microoptimization benchmark a crucial or important issue?" The success of the optimization will depend on the size and complexity of the problem. Find an important problem, solve it, profile the solution, optimize it and profile again. Thus, you can give meaningful information about how much faster the version is optimized. Your boss will be much happier with you, letting you not disclose that optimization is probably only important for your implementation, as I mentioned earlier. I am sure you will find that the smallest of your worries will be dereferenced by marking up the array and dereferencing the pointer.

+2
source

All Articles