How binary file size affects execution speed

How does the size of the binary affect the speed of execution? In particular, I am talking about code written in ANSI-C translated into machine language using the gnu or intel compiler. The target platform for binary files is modern computers with an Intel or AMD multi-core processor running the Linux operating system. The code performs numerical calculations, possibly in parallel with openMP, and the binary code can have several megabytes.

Note that the runtime in any case will be much longer than the time required to download the code and libraries. I think of very specific codes used to solve large systems of ordinary differential equations for modeling kinetic equations, which are usually associated with CPUs for a moderate system size, but can also be associated with memory.

I ask whether small binary size should be a design criterion for high-performance code, or if I can always give preference to explicit code (which ultimately repeats blocks of code that can be implemented as functions) and compiler optimizations such as loop unrolling and so on .d ..

I know about the profiling technique and how I can apply them to specific problems, but I wonder to what extent general statements can be made.

+6
source share
2 answers

The CPU only executes one part of the code, so it contains the contents of the code and how much it moves inside it, which determines the speed.

If you have 10 MB of code, and the first 9Mb is executed only once at startup, it does not matter if it works 9Mb slower or 90Mb or 90kb. If a processor spends 99.99% of its time in a small narrow cycle that performs very efficient computations, then it will be fast, if it works again and again through 100,000 lines of code, it will probably be much slower.

Optimization is to see where the processor spends most of its time and makes this code as efficient as possible in the number of CPU cycles it takes to respond. Sometimes this can mean adding a load of additional β€œpreparatory” code outside it to facilitate / speed up the work of the main part.

On some systems, binary size is a big concern (embedded EG devices), but on others it is almost completely irrelevant.

See also: http://www.codeproject.com/Articles/6154/Writing-Efficient-C-and-C-Code-Optimization

+3
source

Processors have caches.

Compared to processor speed, access to system memory is slow. That's why processors have caches (made from ultrafast memory).

Each processor cache level has different sizes and speeds.

Therefore, to achieve the highest possible speed, it is extremely important to avoid updating the cache at the lowest levels (unfortunately, these are also the smallest caches).

Both code and data will force the cache to be updated. Therefore, in both cases, size matters.

For example: code can generate cache misses on jump or call . data can generate cache misses when loading variable into remote address .

There are other problems, such as alignment , which can greatly affect speed, but cost nothing more than a processor miss (reloading the processor cache is related to synchronizing the processor core and this is not an easy task: it can take something like 250 CPU cycles! )

Without going into platform-specific details, this is what can be said.

Conclusion: keep it simple. And a little beautiful.

+1
source

Source: https://habr.com/ru/post/927952/


All Articles