There is always overhead associated with virtualization, so if it is really necessary, I would not recommend running Hadoop in a virtualized environment.
As I said, I know that VMWare has done a great job of creating Hadoop in a virtualized environment, and they published some guidelines in which, under certain conditions, they claim to have better performance with virtual machines, which are a native application. I havenβt played much with vSphere, but it may be something you can pay attention to if you want to learn more about virtualization. But do not take the numbers for granted, it really depends on the type of equipment you use, so in some conditions I think that you can get some performance with virtual machines, but I assume that in most cases you won nothing.
If you are just starting out and testing Hadoop, I think virtualization is redundant. You can very easily launch Hadoop in pseudo-distributed mode, which means that you can run several Hadoop daemons in one window, each of which is a separate process. This is what I started with Hadoop, and this is a good start. You can find more details here (or you may need a different page depending on which version of Hadoop you are using).
If you get to the point that you want to test using a real cluster, but you donβt have the resources, I would recommend looking at Amazon Elastic Map / Reduce: it gives you a cluster on demand, and it's pretty cheap. This way you can perform more complex tests. More details here .
on the bottom line, I think if the goal is just testing, you really don't need a virtual cluster.
source share