Where to find out about low-level, hardcore products?

This is actually a two-part question:

For people who want to compress every measure, people talk about pipelines, cache locality, etc.

  • I saw these low-level working methods mentioned here and there, but I did not see a good introduction to the topic, from start to finish. Any resource recommendations? (Google gave me definitions and documents where I really appreciate some working examples / training materials in real practical materials).

  • How to really measure such things? Like, like in a profiler? I know that we can always change the code, see the improvement and theorize retrospectively, I'm just wondering if there are installed tools for the job.

(I know that algorithm optimization is an order of magnitude. I'm interested in metal here)

+4
source share
5 answers

The answer choir is Don't optimize prematurely. As you mentioned, you will get much more productivity from a better design than a better cycle, and your maintainers will also appreciate it.

However, to answer your question: Examine the assembly. Many and many assemblies. Do not force the MUL in two when you can move. Learn the weird use of xor to copy and clear registers. For specific links, http://www.mark.masmcode.com/ and http://www.agner.org/optimize/

Yes, you need time for your code. On * nix, this can be as simple as time { commands ; } time { commands ; } , but you probably want to use a full-featured profiler. GNU gprof is open source http://www.cs.utah.edu/dept/old/texinfo/as/gprof.html

If this is really your thing, go for it, have fun and remember lots and lots of maths at the bit level. And your companions will hate you;)

+3
source

EDIT / REWRITE:

If you need books, Michael Abrash has done a good job in this area, the Zen of Assembly language, a number of magazine articles, a large black book on graphical programming, etc. Much of what he tuned up is no longer a problem; problems have changed. What you get from this is the ideas of those things that can cause the neck of the bottle and the ways to solve it. The most important thing is all the time and understand how your time measurements work so that you do not deceive yourself by measuring incorrectly. A time of various decisions and crazy, strange decisions, you can find an optimization that you did not know about and did not understand until you exposed it.

I just started reading, but see MIPS Run (early / first edition) looks pretty good so far (note that ARM has taken MIPS as the market leader in processors, so MIPS and RISC ads are a bit outdated). There are several text books, old and new for MIPS. Mips designed for performance (to the detriment of the software developer in some way).

Today, bottlenecks fall into the categories of the processor itself and the I / Os around it and what is associated with this I / O. The internals of the processor chips themselves (for systems with a higher level) work much faster than I / O can handle, so you can only configure them before you have to turn off the chip and wait forever. When leaving the train, it takes half a minute faster from the train to the destination when the train was in the train for 3 hours, it is not necessary to optimize it.

It's all about learning hardware; you might be left within the world of zeros and not supposed to get into real electronics. But, without knowing the interfaces and internal components, you really cannot do much performance tuning. You can reorganize or change a few instructions and increase a little, but to do something several hundred times faster, you need more. Learning many different sets of instructions (assembler languages) helps you get into processors. I would recommend simulating HDL, for example, processors on opencores, to understand how some people do their projects and get a solid pen on how to really compress the clock from a task. The knowledge of processors is great, memory interfaces are a huge deal, and you need to study them, use media (flash, hard drives, etc.), as well as displays and graphics, network interfaces and all types of interfaces between all these things. And understanding at the watch level or as close to it as possible is what you need.

+2
source

Intel and AMD provide optimization guidelines for x86 and x86-64.

http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html/

http://developer.amd.com/documentation/guides/pages/default.aspx

Another great resource is agner.

http://www.agner.org/optimize/

Some key points (in a specific order):

  • Alignment; memory, labels / loop addresses / functions
  • Cache timeless hints, skipping pages and caches
  • Industries; branch prediction and branch avoidance by comparing and moving op codes
  • vectorization; using SSE and AVX instructions.
  • Op code; avoiding slow operating codes using opportunistic merging
  • Throughput / pipeline; re-ordering or alternating op codes to perform individual tasks, avoiding partial steels and saturating ALU and FPU processors.
  • Loop unrolling; performance of several iterations for one "comparison of cycles, branches"
  • Synchronization; using atomic op code (or LOCK prefix) to avoid high-level sync constructs
+1
source
+1
source

Yes, measure and yes, learn all of these methods.

Experienced people will tell you that “don’t optimize prematurely”, that I simply don’t know.

They will also say “use a profiler to find a bottleneck,” but I have a problem with that. I hear a lot of stories about people using profilers, and they really love them or confuse them with their release. SO is full of them.

What I don’t hear much are success stories with acceleration factors achieved.

The method that I use is very simple, and I tried to give many examples including this case .

+1
source

All Articles