Optimization, compilers and their effects

(i) If a program is optimized for one class of CPU (for example, the multi-core i7 core) to compile the code on the same one, then will its performance be at a suboptimal level on other processors from older generations (for example, Pentium 4) ... Optimization may be detrimental to performance on other processors ..?

(ii) For optimization, compilers can use x86 extensions (for example, SSE 4) that are not available on older processors .... so, is there a rollback to some non-extensions based on older processors ..?

(iii) Intel C ++ Compiler is more optimized than Visual C ++ Compiler or GCC ..

(iv) Will a truly threaded multicore application work efficiently on older processors (e.g. Pentium III or 4) ..?

+4
source share
4 answers
  • It is probably true that the optimizing code for execution on processor X will make this code less optimal for CPU Y than the same code optimized for execution on CPU Y. Probably.

  • Probably no.

  • Unable to generalize. You should check your code and come to your own conclusions.

  • Probably no.

For each argument about why X should be faster than Y under some set of conditions (choosing a compiler, choosing a processor, choosing optimization flags for compilation), some clever SOer will find a counter argument, for each example an example counter. When rubber meets the road, the only remedy you have is to check and measure. If you want to know if the X compiler is “better” than the Y compiler, first determine what you mean better, then do a lot of experimentation, and then analyze the results.

+2
source

Compiling on a platform does not mean optimizing for that platform. (maybe this is just a bad wording in your question.)

In all the compilers I used, optimization for the X platform does not affect the instruction set, only how it is used, for example. Optimization for i7 does not include SSE2 instructions.

In addition, optimizers in most cases avoid "pessimizing" non-optimized platforms, for example. when optimizing for i7, as a rule, a small improvement on i7 will not be selected if it means a big hit for another common platform.

It also depends on the difference in performance in instruction sets. In my opinion, they have become much smaller in the last decade (but I didn’t guess until recently - maybe it’s wrong for the last generations). Also think that optimization differs markedly in only a few places.

To illustrate possible optimizer options, consider the following methods for implementing the switch statement:

  • sequence if (x==c) goto label
  • range check and conversion table
  • binary search
  • combination of the above

The “best” algorithm depends on the relative cost of comparisons, transitions with fixed offsets, and transitions to an address read from memory. They do not differ much on modern platforms, but even small differences can create a preference for one or another implementation.

+2
source

I) If you did not tell the compiler what type of processor you like, the likelihood that it will be slightly suboptimal for all processors. On the other hand, if you let the compiler know what needs to be optimized for your particular type of CPU, then it may definitely be not optimal for other types of processors.

II) No (for Intel and MS at least). If you tell the compiler to compile with SSE4, it will feel safe using SSE4 anywhere in the code without testing. It is your responsibility to ensure that your platform is able to execute SSE4 instructions, otherwise your program will fail. You might want to compile two libraries and download them. An alternative to compiling for SSE4 (or any other set of commands) is to use the built-in functions, they will be checked internally for the best execution of the set of instructions (at the expense of small overhead). Please note that I am not talking about the Instrinsics instructions here (those related to the instruction set), but the built-in functions.

III) This is a whole other discussion in itself. It changes with each version and may differ for different programs. Therefore, the only solution here is to check. But only a note; It is known that Intel compilers are not very well compiled to work on anything other than Intel (for example: built-in functions may not recognize the AMD or Via CPU instruction set).

IV) If we ignore the performance of newer processors and the obvious differences in architecture, then yes, perhaps it will also work on older processors. Multi-core processing does not depend on the type of processor. But performance is VERY dependent on the architecture of the machine (e.g. memory bandwidth, NUMA, chip-chip-bus) and differences in multi-core communications (e.g. cache coherency, bus lock mechanism, shared cache). All this makes it impossible to compare the newer and higher processor performance in MP, but this is not what you are asking for, I believe. Thus, in general, an MP program designed for new processors should not less effectively use the MP aspects of older processors. Or, in other words, just setting up the MP aspects of the program specifically for the older processor will not bring great results. Obviously, you can rewrite your algorithm for more efficient use of a specific CPU (for example: a shared cache may allow you to use an algorithm that exchanges more data between workflows, but this algorithm will die in a system without a shared cache, full bus lock and low latency / bandwidth), but this involves much more than just MP-related settings.

0
source

(1) This is not only possible, but also documented for almost every generation of x86 processors. Return to the year 8088 and make your way forward, every generation. Clock to Clock The new processor was slower for current applications and operating systems (including Linux). Switching from 32 to 64 bits does not help, more cores and less clock speed make it even worse. And this is true for the same reason.

(2) The bank in your binaries does not work or it fails. Sometimes you are lucky; most of the time you do not. There are new instructions, yes, and to support them would probably mean a trap for an undefined instruction and have software emulation of that instruction that would be terribly slow, and the lack of demand for it means that it is probably not well done or just not there. Optimization may use new instructions, but more than that, the main part of the optimization that I think you are talking about is reordering instructions so that various pipelines do not stop. Thus, you organize them quickly on a processor of one generation, they will be slower on another, because in the x86 family the cores change too much. AMD worked well there for a while, as they could make the same code simply faster, instead of trying to invent new processors that would eventually be faster when the software caught. Already, both amd and intel are struggling to just keep the chips without crashing.

(3) Generally yes. For example, gcc is a terrible compiler, one size fits all, no one fits, it can never and will never be optimistic. For example, gcc 4.x code is slower in gcc 3.x code for the same processor (yes, it is all subjective, it all depends on the particular compiled application). The native compilers I used were faster and faster than cheap or free (I'm not limited to x86 here). Are they worth the price? This is a question.
All in all, due to the terrible new programming languages ​​and storage devices, storage, caching levels, software development skills are all low all the time. This means that the pool of engineers able to make a good compiler is much smaller, a good optimizing compiler is reduced over time, this has been happening for at least 10 years. Thus, even the compilers in the house deteriorate over time, or they just have their own employees to work and contribute to open source tools, rather than using the built-in tool. In addition, the tools used by machine engineers are degrading for the same reason, so now we have processors that we hope to launch without failures and are not so much trying to optimize. There are so many bugs and chips that most of the work with the compiler avoids bugs. Bottom line, gcc destroyed the compiler world.

(4) See (2) above. Do not take care of him. Your operating system, for which you want to run it, most likely will not be installed on an older processor, saving you the pain. For the same reason, binaries optimized for your Pentium III run slower on your Pentium 4 and vice versa. Code written to run on multi-core processors will run slower on single-core processors than if you optimized the same application for a single-core processor.

The root of the problem is the x86 command set is terrible. So many distant instruction sets that do not require hardware tricks to make them faster every generation. But the pendulum created two monopolies, while others could not enter the market. My friends keep reminding me that these x86 machines are microcoded so that you really don't see the instruction inside. What irritates me even more is that the awful isa is just a layer of interpretation. This is similar to using Java. The problems that you outlined in your questions will continue until Intel remains at the top, if the replacement does not become a monopoly, then we will forever remain on the Java model, where you are one or the other from the common denominator, or you emulate a common platform on your particular hardware or write applications and compile them onto a common platform.

0
source

All Articles