Multithreading: which point has more threads than cores?

I thought the point of a multi-core computer is that it can run multiple threads simultaneously. In this case, if you have a quad-core machine, which point has more than 4 threads working simultaneously? Don't they just steal time from each other?

+75
multithreading hardware cpu-cores
Jun 27 '10 at 2:18
source share
17 answers

Just because a stream exists does not always mean that it is actively working. Many thread applications will cause some of the threads to sleep until it is time to execute them — for example, user input that starts threads to wake up, do some processing, and return to sleep mode.

In essence, threads are separate tasks that can work independently of each other, without the need to know about the progress of another task. It is possible to have more than you have the opportunity to work simultaneously; they are still useful for convenience, even if they sometimes have to wait in line for each other.

+39
Jun 27 '10 at 2:25
source share

The answer revolves around the goal of threads, which is parallelism: to run several separate lines of execution at once. In an “ideal” system, you will have one thread running on the kernel: without interruption. This is actually not the case. Even if you have four cores and four worker threads, your process and its threads will be permanently disabled for other processes and threads. If you use any modern OS, each process has at least one thread, and many have more. All these processes are launched immediately. At the moment, you probably already have several hundred threads. You will never encounter a situation where a thread is running without time "stolen" from it. (Well, maybe if it works in real time , if you use a live OS or even on Windows, use real-time thread priority, but it's rare.)

With this as a background, the answer is: Yes, more than four threads on a true quad-core computer can give you a situation where they "steal time from each other" , but only if each individual thread requires 100% CPU . If the thread does not work 100% (since the user interface thread may not be there, or the thread does little work or waits for something else), then the other thread that you plan to schedule is actually a good situation.

This is actually more complicated than this:

  • What if you have five bits of work to do right away? It makes sense to launch them all at once, than launch four of them, and then launch the fifth one later.

  • Rarely does a thread really need a 100% processor. For example, when it uses disk or network I / O, it can potentially waste time without doing anything useful. This is a very common situation.

  • If you have work to start, one of the common mechanisms is to use threadpool. It may make sense to have the same number of threads as the cores, but .Net threadpool has up to 250 threads available for each processor . I'm not sure why they do this, but I assume that it depends on the size of the tasks that are performed for the threads.

So: theft time is not bad (and in fact it is not a theft: either how the system should work). Write your multi-threaded programs based on the work that threads will do that might not be CPU related. Determine the number of threads you need based on profiling and measurement. You may find it more useful to think about tasks or tasks rather than threads: write work objects and provide them with a pool to be launched. Finally, if your program is not performance critical, do not worry too much :)

+39
Jun 27 '10 at 4:37 a.m.
source share

The fact is that, despite the lack of real acceleration, when the number of threads exceeds the number of cores, you can use threads to decouple pieces of logic that should not be interdependent.

Even in a moderately complex application, using a single thread, try to do everything quickly, making a hash of the "stream" of your code. A single thread spends most of its time polling it, checking it, conditionally invoking procedures as necessary, and it becomes difficult to see anything but the clutter of little things.

Contrast this with a situation where you can dedicate threads to tasks so that by looking at any single thread you can see what that thread is doing. For example, one thread can block waiting on input from a socket, analyze a thread in messages, filter messages, and when a valid message appears, transfer it to another worker thread. Workflow can work at the input from a number of other sources. The code for each of them will demonstrate a clean, focused flow, without the need to make explicit checks that there is nothing else.

Separating this work this way allows your application to rely on the operating system to plan what to do next with the processor, so you don’t need to do explicit conditional checks throughout the application about what the process might block and what the process is ready for.

+22
Jun 27 2018-10-06T00:
source share

If a thread is waiting for a resource (for example, loading a value from RAM into a register, I / O to disk, network access, starting a new process, querying a database, or waiting for user input), the processor can work with another thread and return to the first thread as soon as the resource will be available. This reduces the processor’s idle time because the processor can perform millions of operations instead of sitting idle.

Consider a stream that should read data from a hard drive. In 2014, a typical processor core runs at 2.5 GHz and can execute 4 instructions per cycle. At a cycle time of 0.4 ns, the processor can execute 10 instructions per nanosecond. With a typical hard drive search time of about 10 milliseconds, the processor is able to execute 100 million instructions in the time it takes to read the value from the hard drive. There can be significant performance improvements with small-cache hard drives (4 MB buffers) and hybrid drives with several GB of storage, because data latency for sequential reads or reads from a hybrid partition can be several orders of magnitude faster.

The processor core can switch between threads (the cost of pausing and resuming a stream is about 100 ticks), while the first thread expects input with a high delay (something more expensive than registers (1 clock) and RAM (5 nanoseconds)) These include disk I / O, network access (latency 250 ms), reading data from a CD, or a slow bus or database call. With more threads than cores, useful work can be done when performing tasks with high latency.

The CPU has a thread scheduler that assigns priority to each thread and allows the thread to sleep, and then resumes after a given time. This is the task of the thread scheduler to reduce the shredding that will occur if each thread executes a total of 100 instructions before being strewn again. The overhead of switching flows will reduce the overall usable throughput of the processor core.

For this reason, you can break your problem down into a reasonable number of threads. If you write code to perform matrix multiplication, creating one stream per cell in the output matrix can be excessive, while one stream per line or n lines in the output matrix can reduce the overhead of creating, pausing, and resuming threads.

This also explains why branch prediction is important. If you have an if statement that requires loading values ​​from RAM, but the body of the if and else statements uses values ​​already loaded into the registers, the processor can execute one or both branches before the condition is evaluated. As soon as the condition returns, the processor will apply the result of the corresponding branch and discard the other. Doing potentially worthless work here is probably better than switching to another thread, which can lead to shredding.

As we moved from high-performance single-core processors to multi-core processors, the chip design focused on driving more cores per chip, improving sharing resources on a chip between cores, better branch prediction algorithms, better thread switching overhead, and better thread scheduling.

+8
Jul 12 '14 at 19:01
source share

Although you can certainly use threads to speed up the calculations depending on your hardware, one of their main uses is to use multiple items at the same time for reasons of user convenience.

For example, if you need to do some processing in the background and also respond to user interface input, you can use threads. Without threads, the user interface freezes every time you try to do heavy processing.

Also see this related question: Practical use of threads

+6
Jun 27 '10 at 2:20
source share

I strongly disagree with @kyoryu's statement that the ideal number is one thread per processor.

Think of it this way: why do we have multiprocessor operating systems? For most of computer history, almost all computers had one processor. However, since the 1960s, all “real” computers have had multi-processor (so-called multi-tasking) operating systems.

You run several programs so that they can be launched, while others are blocked for things like IO.

allows you to defer arguments about whether versions of Windows prior to NT are multitasking. Since then, every real OS has had multitasking. Some of them do not disclose it to users, but things are still being done there, such as listening to a cell phone cordless telephone, talking to a GPS chip, receiving mouse input, etc.

Threads are just tasks that are a bit more efficient. There is no fundamental difference between a task, a process, and a thread.

A processor is a terrible thing to spend, so you have a lot of things you can use when you can.

I agree that with most of the procedural languages ​​C, C ++, Java, etc., writing the right stream of safe code is a lot of work. With 6 core processors and 16 core processors on the market today, I expect people to move away from these old languages, as multithreading is becoming an increasingly critical requirement.

Disagreement with @kyoryu is just IMHO, the rest is a fact.

+6
Jun 27 '10 at 4:53 on
source share

Imagine a web server that should serve an arbitrary number of requests. You must execute the requests in parallel, because otherwise, each new request must wait for the completion of all other requests (including sending a response via the Internet). In this case, most web servers have much fewer cores than the number of requests that they usually serve.

It also simplifies server development: you only need to write a thread program that serves the request, you do not need to think about storing multiple requests, the order in which you serve them, and so on.

+5
Jun 27 '10 at 2:25
source share

Most of the answers above talk about performance and simultaneous operation. I approach this from a different angle.

Take the case of, say, a simplified terminal emulation program. You should do the following:

  • monitor incoming characters from the remote system and display them
  • monitor things coming from the keyboard and send them to the remote system.

(Real terminal emulators do more, including potentially echoing from what you type on the display, but we will get it right now.)

Now the loop for reading from the remote is simple according to the following pseudo-code:

while get-character-from-remote: print-to-screen character 

The loop for controlling the keyboard and sending is also simple:

 while get-character-from-keyboard: send-to-remote character 

The problem is that you have to do this at the same time. The code should now look bigger if you don't have streaming:

 loop: check-for-remote-character if remote-character-is-ready: print-to-screen character check-for-keyboard-entry if keyboard-is-ready: send-to-remote character 

The logic, even in this intentionally simplified example that does not take into account the complexity of communications in the real world, is rather confusing. However, when threading even on one core, two pseudo-code cycles can exist independently of each other, without alternating their logic. Since both threads will be mainly connected with I / O binding, they do not load the CPU, even if they are, strictly speaking, more wasteful for CPU resources than the integral cycle.

Now, of course, use in the real world is more complicated than stated above. But the complexity of the integrated loop is increasing exponentially as you add more problems to the application. Logic is becoming more and more fragmented, and you need to start using methods such as state machines, coroutines, etc., to achieve convenience. Managed, but not readable. Threading keeps the code more readable.

So why don't you use streams?

Well, if your tasks are CPU bound instead of I / O binding, threads actually slow down your system. Performance will suffer. In many cases a lot. ("Thrashing" is a common problem, if you throw too many threads associated with the processor, you spend more time changing active threads than on the contents of the threads themselves). In addition, one of the reasons is so simple that I very consciously chose a simplified (and unrealistic) example. If you want to repeat what was printed on the screen, then you have a new world when you enter a lock on shared resources. With just one shared resource, this is not so much a problem, but it is starting to become an increasingly serious problem as you have more resources to share.

So, after all, multithreading is a lot of things. For example, this is due to the fact that the processes associated with I / O binding are more responsive (even if they are less efficient in general), as some have already said. It also facilitates the logic of the logic (but only if you minimize the overall state). It is about a lot, and you need to decide whether its advantages exceed its disadvantages on an individual basis.

+4
Jun 27 '10 at 5:10
source share

Streams can help with responsiveness in user interface applications. In addition, you can use threads to get more work from your cores. For example, on one core you can have one thread doing IO, and the other a calculation. If it were single-threaded, the kernel could remain inactive, waiting for the completion of I / O. This is a fairly high-level example, but threads can definitely be used to power your processor a bit more complicated.

+2
Jun 27 '10 at 2:26
source share

A processor or processor is a physical chip that is connected to the system. A processor may have several cores (the core is part of a chip capable of executing instructions). A kernel can be displayed to the operating system as several virtual processors if it is capable of simultaneously executing several threads (a thread is a single sequence of instructions).

A process is a different application name. As a rule, processes are independent of each other. If one process dies, this does not mean that the other process also dies. Processes can interact or share resources, such as memory or I / O.

Each process has a separate address space and stack. A process can contain several threads, each of which can execute instructions simultaneously. All process threads have the same address space, but each thread will have its own stack.

We hope that these definitions and further research, using these basics, will help you understand.

+2
Jun 27 '10 at 2:50
source share

Many threads will sleep, waiting for user input, I / O, and other events.

+2
Jun 27 2018-10-06T00:
source share

The ideal use of threads is, indeed, one per core.

However, if you use exclusively asynchronous / non-blocking IO, there is a good chance that at some point you will block threads in IO that your CPU will not use.

In addition, typical programming languages ​​make it difficult to use 1 thread per processor. Languages ​​created around concurrency (e.g. Erlang) can facilitate the use of additional threads.

+1
Jun 27 '10 at 2:25
source share

As several APIs are developed, you have no choice , but to run them in a separate thread (anything with blocking). An example is the Python HTTP Library (AFAIK).

This is usually not a problem, though (if this is a problem, the OS or API should come with an alternative asynchronous mode of operation, that is: select(2) ), because this probably means the thread will sleep while waiting for I / O to complete . On the other hand, if something does heavy computation, you have to put it in a separate stream, rather than say, a GUI stream (if you don't like manual multiplexing).

+1
Jun 27. 2018-10-06T00:
source share

In response to your first hypothesis: multi-core machines can run multiple processes at once, and not just multiple threads of a single process.

In answer to your first question: the point of several threads usually performs several tasks simultaneously in one application. Classic examples on the web are email, sending and receiving mail, and a web server receiving and sending page requests. (Note that it is almost impossible to reduce a system, such as Windows, to running only one thread or even one process. Launch the Windows task manager and you will usually see a long list of active processes, many of which will work with multiple threads.)

In answer to your second question: most processes / threads are not CPU bound (i.e. they do not work continuously and smoothly), but instead they stop and wait for the I / O operations to finish. During this wait, other processes / threads can execute without “theft” from the wait code (even on the same main machine).

0
Jun 27 '10 at 5:21
source share

I know this is a very old question with lots of good answers, but I'm here to point out what is important in the current environment:

If you want to create an application for multithreading, you do not have to design for a specific hardware configuration. CPU technology has been developing quite rapidly over the years, and the number of cores is constantly growing. If you intentionally develop an application in such a way that it uses only 4 threads, then you are potentially limiting yourself to the octane core (for example). Now even 20-core systems are commercially available, so such a design definitely does more harm than good.

0
Nov 03 '17 at 10:07 on
source share

A thread is an abstraction that allows you to write code as easily as a sequence of operations, without knowing that the code is executed with alternating other code.

-2
Aug 26 2018-11-21T00:
source share

The fact is that the vast majority of programmers do not understand how to create a state machine. The ability to insert everything into your own stream frees the programmer from having to think about how to efficiently present the state of various calculations during execution so that they can be interrupted and then resumed.

As an example, consider video compression, a very difficult task. If you use the gui tool, you probably want the interface to stay responsive (show progress, respond to cancel requests, resize windows, etc.). Thus, you develop encoder software to process a large block (one or several frames) at a time and run it in your stream, separately from the user interface.

Of course, as soon as you understand, it would be nice to save the encoding state during execution so that you can close the program to restart or play a resource-intensive game, you understand that you should have learned how to create the state of machines from the very beginning. Either this, or you decide to create a completely new sleeping problem for your OS so that you can pause and resume individual applications on disk ...

-8
Jun 27 '10 at 9:37
source share



All Articles