Which is more efficient? More cores or more processors

I understand that this is more of a hardware issue, but it is also very important for software, especially when programming multi-threaded multi-core / processor environments.

Which is better and why? Be it efficiency, speed, performance, usability, etc.

1.) A computer / server with 4 quad-core processors?

or

2.) A computer / server with 16 single-core processors?

Assume that all other factors (speed, cache, bus speed, bandwidth, etc.) are equal.

Edit

I'm interested in the performance aspect in general. As for if it is especially better in one aspect and terribly (or not preferable) in another, then I would also like to know that.

And if I need to choose, I will be very interested in what is better with respect to I / O related applications and computing related applications.

+7
performance multithreading multicore
source share
4 answers

This is not an easy question. Computer architecture is not surprisingly complex. Below are some guidelines, but even those are simplifications. Many of them will come to your application and the limitations that you work with (both business and technical).

Processors have several (2-3 common) levels of caching on the processor . Some modern processors also have a memory controller on the matrix. This can significantly increase the speed of memory exchange between the cores. I / O memory between processors should go over an external bus, which tends to be slower.

AMD / ATI chips use HyperTransport , which is a point-to-point protocol.

The complication of all this is bus architecture. Intel Core 2 Duo / Quad uses a shared bus . Think about it, for example, Ethernet or cable Internet, where there is only so much bandwidth, and each new member just gets another share of the whole. Core i7 and newer Xeons use QuickPath , which is very similar to HyperTransport.

More cores will take up less space, use less space and less power and cheaper (unless you use really low-power processors) both in terms of the core and in the cost of other equipment (for example, motherboards).

In general, one of the processors will be the cheapest (both in terms of hardware and software). For this, you can use equipment for goods. As soon as you move to the second socket, you will usually have to use different chipsets, more expensive motherboards and often more expensive RAM (for example, ECC fully buffered RAM), so you take a huge number of hits from one processor to two. This is one of the reasons many large sites (including Flickr, Google, and others) use thousands of product servers (although Googleโ€™s servers are somewhat configured to include things like a 9V battery, the principle is the same) .

Your changes do not change much. "Performance" is a very subjective concept. Performance for what? Keep in mind that if your application is not multi-threaded (or multi-processor) enough to use additional cores, you can actually slow down the performance by adding more cores.

I / O-bound applications probably won't prefer each other. In the end, they are associated with I / O, and not with the processor.

For computing-based applications, this depends on the nature of the computing. If you do a lot of floating point, you can win much more by using the GPU to calculate the unloading (for example, using Nvidia CUDA ). You can benefit from tremendous productivity. Take a look at the GPU client for Folding @Home for an example.

In short, your question does not lend itself to a concrete answer, because the topic is complex and there is not enough information. Technical architecture is what needs to be developed for a particular application.

+12
source share

Well, the fact is that all other factors cannot really be equal.

The main problem with a multiprocessor processor is latency and bandwidth, when two CPU sockets must communicate with each other. And this has to happen all the time to make sure their local caches are out of sync. This causes a delay and can sometimes be a bottleneck in your code. (Not always, of course.)

+3
source share

More cores on fewer processors are definitely faster than SPWorley writes. His answer is close to three years, but the trends are there, and I believe that his answer needs some clarification. First, a little history.

In the early 80s, 80286 became the first microprocessor in which virtual memory was feasible. Itโ€™s not that this has not been tested before, but the intelligence integrated virtual memory management onto the chip (on-die) instead of having a solution without a matrix. This led to the fact that their solution for managing memory was much faster than the solution for their competitors, because all memory management (in particular, translation of virtual to physical addresses) was developed and partially generated.

Remember those big clumsy P2 and P3 processors from Intel and the early AMD Athlon and durons that were installed on the side and contained in a large plastic bag? The reason for this was to be able to connect the cache chip next to the processor chip, since the manufacturing processes of that time made it impossible to fit the cache on the processor. Voilร  off-die, processor solution. Due to time limitations, these cache chips will execute on a fraction (50% or so) of the CPU clock frequency. As soon as the production process caught up, the caches were moved to a cube and began to work at the internal clock frequency.

A few years ago, AMD moved the RAM memory controller from the north bridge (off-die) and to the processor (on-die). What for? Because it makes memory operations more efficient (faster) by eliminating external addressing wiring by half and eliminating the transition across the north bridge (CPU-wiring-Northbridge-wiring-RAM becomes CPU-wiring-RAM). This change also made it possible to have several independent memory controllers with their own sets of RAM running simultaneously on the same matrix, which increases the processor memory bandwidth.

To return to the explanation, we see a long-term tendency to transfer critical functions from the motherboard to the processor matrix. In addition to the ones mentioned above, we saw the integration of several cores on the same matrix, the L2 / on-die L1 caps became the unprotected L3 / on-die caches L1 and L2, which are now on the L1, L2 and L3 stamp. Caches have become larger and larger to the extent that they take up more space than the cores themselves.

So to summarize: any time you need to leave, mental things slow down dramatically. Answer: make sure that you stay on your mind as much as possible, and simplify the design of everything that should disappear.

+2
source share

To some extent, it depends on the architecture; BUT, a quad-core processor is almost the same (or better) than 4 physically separate processors due to reduced communication (i.e. no need to go crazy and not travel very far, which is a factor) and shared resources.

+1
source share

All Articles