The following articles describe how a heterogeneous cluster affects the performance of map-reduce hasoop:
In a heterogeneous cluster, the processing power of nodes can vary significantly. A high-speed node can process processing data stored in the local node disk faster than low-speed ones. After a fast node completes processing of its local input, node must support load balancing by processing raw data located in one or more remote slow nodes. When the amount of data transferred due to load balancing is very large, the overhead of moving raw data from slow nodes to fast nodes becomes a critical issue affecting Hadoops performance.
The following links have more detailed information:
It also provides ways in which you could increase the performance of a heterogeneous cluster or avoid this decrease in performance.
It is reasonable to assume that you have homogeneous machines on your cluster, but if these machines do not have completely different specifications and performance differences, you should continue to build the cluster.
For production systems you should offer for homogeneous machines. For development, performance is not critical.
However, you should be able to compare your Hadoop cluster after creating it.
pyfunc
source share