No wonder it all depends on your problem. If you can easily break it down into subtasks that don't report much, scaling gives trivial accelerations. For example, a word search on 1B web pages can be performed by a single machine that searches 1B pages, or by 1M machines that make 1,000 pages each without significant loss of efficiency (for example, with an acceleration of 1,000,000 x). This is called the "embarrassing parallel."
Other algorithms, however, require much more intensive communication between children. Your cross-analysis case study is a great example of where communication can often drown out performance gains when adding more boxes. In these cases, you need to keep in touch within the (larger) field, passing through high-speed interconnects, rather than something like βcommonβ like (10-) Gig-E.
Of course, this is a rather theoretical point of view. Other factors, such as I / O, reliability, ease of programming (one large machine with shared memory usually gives much less headaches than a cluster) can also have a big impact.
Finally, due to the (often extreme) costs associated with scaling using cheap commodity equipment, the cluster / network approach has recently attracted much more (algorithmic) research. This creates new parallelization methods that minimize communication and, therefore, work much better on a cluster, while the general knowledge used to dictate that these types of algorithms can only work effectively on large iron machines ...
Wim
source share