Reasons for NOT building up against -out?

Question

Reasons for NOT building up against -out?

As a programmer, I draw revolutionary conclusions every few years. I either get ahead of the curve, or behind it by about π in phase. One difficult lesson I learned is that scaling OUT is not always better, often the biggest performance gain when rearranging and scaling.

What reasons do you have for scaling against? Price, performance, vision, projected use? If so, how does it work for you?

We once identified several hundred nodes that will serialize and cache the necessary data for each node and start mathematical processes in the records. Many, many billions of records needed to be analyzed (cross). It was the perfect business and technical solution to use scaling. We continued optimization until we processed about 24 hours of data in 26 hours. In fact, a long story: we rented a giant (for a while) IBM pSeries, installed Oracle Enterprise on it, indexed our data and finished processing the same 24 hours of data after about 6 hours. The revolution is for me.

So many enterprise systems are OLTP, and the data is not too much, but the desire of many is cluster or scaling. Is this a reaction to new methods or perceived performance?

Are there applications in general today or are our programming mattresses better for scaling? Do we / should we always take this trend into account in the future?

+7

scalability

Xailor Nov 02 '09 at 10:24

source share

4 answers

Since scaling

Ultimately limited by the size of the box you can really buy
It can become extremely cost-effective, for example. a machine with 128 cores and 128 GB is much more expensive than 16 with 8 cores and 8 GB each.
Some things do not scale well - for example, IO read operations.
With scaling, if your architecture is right, you can also achieve high availability. A 128-core 128 GB machine is very expensive, but having the 2nd redundant extortion option.

And also to some extent, because that's what Google does.

+6

Markr Nov 02 '09 at 22:30

source share

Scaling is best for awkwardly parallel issues. This takes some work, but a number of web services fit this category (thus, the current popularity). Otherwise, you will come across Amdahl law , which then means that you need to increase the speed, which you should not scale. I suspect you have encountered this problem. In addition, IO-bound operations also tend to excel in scaling mainly because waiting for IO increases%, which is parallelizable.

+6

Kathy van stone Nov 02 '09 at 10:52

source share

Blog post Scaling and Scaling: Jeff Atwood's Hidden Costs has some interesting points to consider, such as software licensing and power costs.

+5

Daniel ballinger Nov 02 '09 at 23:02

source share

Wim · Accepted Answer · 2009-11-02T22:47:28+0000

No wonder it all depends on your problem. If you can easily break it down into subtasks that don't report much, scaling gives trivial accelerations. For example, a word search on 1B web pages can be performed by a single machine that searches 1B pages, or by 1M machines that make 1,000 pages each without significant loss of efficiency (for example, with an acceleration of 1,000,000 x). This is called the "embarrassing parallel."

Other algorithms, however, require much more intensive communication between children. Your cross-analysis case study is a great example of where communication can often drown out performance gains when adding more boxes. In these cases, you need to keep in touch within the (larger) field, passing through high-speed interconnects, rather than something like “common” like (10-) Gig-E.

Of course, this is a rather theoretical point of view. Other factors, such as I / O, reliability, ease of programming (one large machine with shared memory usually gives much less headaches than a cluster) can also have a big impact.

Finally, due to the (often extreme) costs associated with scaling using cheap commodity equipment, the cluster / network approach has recently attracted much more (algorithmic) research. This creates new parallelization methods that minimize communication and, therefore, work much better on a cluster, while the general knowledge used to dictate that these types of algorithms can only work effectively on large iron machines ...

Reasons for NOT building up against -out?

More articles: