Programming and compilation status for multi-core systems

I research multicore processors; in particular, I am looking for code for multi-core processors, as well as compiling code for multi-core processors.

I am interested in the main problems in this area, which currently will not allow the widespread use of programming methods and methods to fully utilize the capabilities of multi-core architectures.

I am aware of the following efforts (some of them do not seem directly related to multi-core architectures, but seem to be more related to parallel programming models, multi-threaded and concurrency):

  • Erlang (I know that Erlang contains constructs to facilitate concurrency, but I'm not sure how it is used for multi-core architectures)
  • OpenMP (this is mainly due to multiprocessing and using cluster power)
  • Unified Parallel C
  • Cilk
  • Intel Threading Blocks (this seems to be directly related to multi-core systems, it makes sense, like from Intel. In addition to defining certain programming constructs, it also seems to have functions that tell the compiler to optimize code for multi-core architectures)

In general, from the little experience that I have with multi-threaded programming, I know that programming with concurrency and parallelism in mind is definitely a complicated concept. I also know that multi-threaded programming and multi-core programming are two different things. in multi-threaded programming, you guarantee that the CPU does not remain inactive (in a system with one processor. As James noted, the OS can schedule various threads to work on different cores, but I'm more interested in describing parallel operations from the language itself or through the compiler). As far as I know, you cannot perform parallel operations. In multi-core systems, you must perform truly parallel operations.

So, it seems to me that there are currently problems of multi-core programming:

  • Multi-core programming is a complex concept that requires considerable skill.
  • In modern programming languages, there are no built-in constructs that provide a good abstraction for a program for a multi-core environment.
  • Apart from the Intel TBB library, I did not find efforts in other programming languages ​​to use the capabilities of multi-core architectures for compilation (for example, I do not know whether the Java or C # compiler optimizes bytecode for multi-core systems or even if the JIT compiler does this)

I am interested to know what other problems may be, and if there are any solutions to these problems in these works. Links to research papers (and such things) are useful. Thank you

EDIT

If I had to compensate my question for one sentence, it would be like this: what are the problems that multicore programs face today and what research is being done in this area to solve these problems?

UPDATE

It also seems to me that there are three levels in which multi-core is needed:

  • Language level . Constructs / concepts / frameworks that abstract parallelization and concurrency and make it easier for programmers to express the same
  • Compiler level . If the compiler knows for which architecture it is being compiled, it can optimize the compiled code for this architecture.
  • OS level . The OS optimizes the running process and possibly schedules various threads / processes to work on different cores.

I searched ACM and IEEE and found some documents. Most of them talk about how difficult it is to think at the same time, and how in current languages ​​there is no suitable way to express concurrency. Some went so far as to argue that the existing concurrency model that we have (threads) is not a good way to handle concurrency (even on multiple cores). I am interested to hear other opinions.

+6
multicore
source share
5 answers

The main problems with multi-core programming are the same as writing any other parallel applications, but while before it was unusual to have several processors on a computer, it is now difficult to find any modern computer with one core in it, therefore To use multi-core architectures with multiple processors, new problems arise.

But this problem is an old problem, when computer architectures go beyond compilers, then it seems that the transition to functional programming seems to be a backup solution, since this programming paradigm, if strictly followed, can make very parallelizable programs, since you don’t, for example, have either global mutable variables.

But not all problems can be easily accomplished using FP, so the goal is how easy it is to easily get other multi-code programming paradigms.

Firstly, many programmers avoided writing good applications with multiple applications, so there wasn’t a well-trained number of developers, as they learned habits that would make their coding difficult.

But, as with most changes in the processor, you can see how to change the compiler, and for this you can look at Scala, Haskell, Erlang and F #.

For libraries, you can look at the extension of the parallel MS structure to simplify parallel programming.

It works, but I recently had either IEEE Spectrum or IEEE Computer had articles on multi-core programming issues, so check which IEEE and ACM articles were written on these issues to get more ideas on what is being considered.

I think the biggest obstacle will be the difficulty of getting programmers to change their language, since FP is very different from OOP.

One of the research areas, besides developing languages ​​that will work well, is a way to handle multiple threads accessing memory, but, like in this area, Haskell seems to be at the forefront of testing ideas for this, so you can see what happens to Haskell.

Eventually new languages ​​will appear, and perhaps we have DSL to help the abstract developer more, but how to train programmers on this will be a daunting task.

UPDATE:

You can find chapter 24. Parallel and multi-core programming, http://book.realworldhaskell.org/read/concurrent-and-multicore-programming.html

+3
source share

I am interested in the main problems in this area, which currently will not allow the widespread use of programming methods and methods to fully utilize the capabilities of multi-core architectures.

Inertia. (By the way: this is largely the answer to all the questions "what prevents the widespread adoption", whether it be parallel programming models, garbage collection, type safety, or economical cars).

We know from the 1960s that the model of threads + locks is fundamentally violated. By 1980, we had about a dozen of the best models. And yet, the vast majority of languages ​​that are used today (including languages ​​that were created from scratch after 1980) only offer threads + locks.

+5
source share

One answer mentioned parallel extensions for the .NET Framework, and since you mentioned C #, this is definitely what I will explore. Microsoft did something interesting there, although it seems to me that many of their attempts are more suitable for language improvements in C # than a separate library for parallel programming. But I think that their efforts deserve applause and respect, that we are here early. (Disclaimer: I was Visual Studio Marketing Director about 3 years ago)

Intel Thread Building Blocks are also quite interesting (Intel recently released a new version, and I am very happy to head the Intel Developer Forum next week to learn more about how to use it correctly).

Finally, I work for Corensic, a startup software company in Seattle. We have a tool called Jinx that is designed to detect concurrency errors in your code. A 30-day trial is available for Windows and Linux, so you can check it out. (Www.corensic.com)

In short, Jinx is a very thin hypervisor that, when activated, slips between the processor and the operating system. Jinx then intelligently takes the pieces of execution and starts simulating various timings of the stream to look for errors. When we find a specific thread time that will cause an error, we will do the “reality” tactics on your computer (for example, if you use Visual Studio, the debugger will stop at this point). Then we indicate the area in your code where the error was caused. There are no false positives with Jinx. When it detects an error, it is definitely an error.

Jinx runs on Linux and Windows, as well as native and managed code. It is an agnostic language and application platform and can work with all your existing tools.

If you check this out, send us feedback on what works and doesn't work. We are launching Jinx on some large open source projects and are already seeing situations where Jinx can find errors 50-100 times faster than just stress testing.

+3
source share

The bottleneck of any high-performance application (written in C or C ++) designed for the efficient use of more than one processor / core is the memory system (caches and RAM). One core usually saturates the memory system with reading and writing, so it’s easy to see why adding extra cores and threads makes the application run slower. If a line of people can go through a door once, adding additional lines will not only clog the door, but also make any person's passage through the door less effective.

The key to any multi-core application is optimization and saving on memory access. This means structuring data and code to work as much as possible inside their own caches, where they do not interfere with other cores with access to a shared cache (L3) or RAM. From time to time, the kernel should take risks there, but the trick is to minimize these situations. In particular, the data must be structured and adapted for caching strings and their sizes (currently 64 bytes), and the code should be compact and not invoke and move around the place, which also destroys pipelines.

My experience is that effective solutions are unique to the application in question. General guidelines (above) are the basis for building the code, but changes to the settings obtained as a result of generalizing conclusions will not be obvious to those who are not involved in optimizing the work.

+2
source share

Take a look at the fork / join framework and runtimes. Two names for the same, or at least related, approaches that recursively subdivide large tasks into light units, so that all available parallelism are used without thinking in advance about how much parallelism exists. The idea is that it should work at consistent speed on a single-processor system, but get linear acceleration with several cores.

Relative to the horizontal counterpart of fail-safe algorithms, if you look at it correctly.

But I would say that the main problem facing multi-core programming is that the vast majority of calculations remain stubbornly sequential. There it is simply impossible to throw away several cores in these calculations and make them stick.

+1
source share

All Articles