Hadoop virtual cluster versus single machine

I have a question regarding speed and performance using multiple virtualized nodes in a single VS single node machine on a single machine itself.

which will work better?

The reason I ask this question is because I am currently studying hadoop on one machine, and I see several tutorials on the Internet that show the use of multiple virtualized nodes in one machine.

Thank you in advance

+4
source share
2 answers

There is always overhead associated with virtualization, so if it is really necessary, I would not recommend running Hadoop in a virtualized environment.

As I said, I know that VMWare has done a great job of creating Hadoop in a virtualized environment, and they published some guidelines in which, under certain conditions, they claim to have better performance with virtual machines, which are a native application. I haven’t played much with vSphere, but it may be something you can pay attention to if you want to learn more about virtualization. But do not take the numbers for granted, it really depends on the type of equipment you use, so in some conditions I think that you can get some performance with virtual machines, but I assume that in most cases you won nothing.

If you are just starting out and testing Hadoop, I think virtualization is redundant. You can very easily launch Hadoop in pseudo-distributed mode, which means that you can run several Hadoop daemons in one window, each of which is a separate process. This is what I started with Hadoop, and this is a good start. You can find more details here (or you may need a different page depending on which version of Hadoop you are using).

If you get to the point that you want to test using a real cluster, but you don’t have the resources, I would recommend looking at Amazon Elastic Map / Reduce: it gives you a cluster on demand, and it's pretty cheap. This way you can perform more complex tests. More details here .

on the bottom line, I think if the goal is just testing, you really don't need a virtual cluster.

+4
source

An analysis of the effectiveness of the analysis conducted on this topic showed that the Hadoop virtual cluster is 4% less efficient than its native counterpart: Example of a virtualized analysis of the effectiveness of working with adoop applications

+1
source

All Articles