Should suoop clusters run on the same hardware?

I remember reading somewhere that the performance of Hadoop is significantly degraded if the machines it runs on are very different from each other, but I can no longer find this comment. I am considering running a Hadoop cluster on an array of virtual machines that is not directly managed by my group, and I need to know if this is required for my request.

So, should I insist that all of my machines have the same hardware, or work fine on different machines in different hardware configurations?

Thanks.

+8
hadoop
source share
2 answers

The following articles describe how a heterogeneous cluster affects the performance of map-reduce hasoop:

In a heterogeneous cluster, the processing power of nodes can vary significantly. A high-speed node can process processing data stored in the local node disk faster than low-speed ones. After a fast node completes processing of its local input, node must support load balancing by processing raw data located in one or more remote slow nodes. When the amount of data transferred due to load balancing is very large, the overhead of moving raw data from slow nodes to fast nodes becomes a critical issue affecting Hadoops performance.

The following links have more detailed information:

It also provides ways in which you could increase the performance of a heterogeneous cluster or avoid this decrease in performance.

It is reasonable to assume that you have homogeneous machines on your cluster, but if these machines do not have completely different specifications and performance differences, you should continue to build the cluster.

For production systems you should offer for homogeneous machines. For development, performance is not critical.

However, you should be able to compare your Hadoop cluster after creating it.

+12
source share

A homogeneous cluster is certainly perfect, but it is not strictly necessary. For example, Yahoo !, Inc. runs heterogeneous clusters in its production environment. From conversations with scientists there, they believe that there is a drop in productivity due to scheduling problems (it’s a big enough hit that they are working on adding planning performance knowledgeable to their tools), but they don’t damage the penalty.

+2
source share

All Articles