Multi-node Hadoop Cluster with Docker

Question

Multi-node Hadoop Cluster with Docker

I am in the planning phase of a multi-node Hadoop cluster in a Docker environment. Therefore, it should be based on a simple, easy-to-use virtualized system. The current architecture (regarding documentation) contains 1 main and 3 subordinate nodes. This host computer uses the HDFS and KVM file system for virtualization. The entire cloud is managed by the Cloudera Manager . There are several Hadoop modules installed on this cluster. There is also a NodeJS data download service. This time I have to create a Docker architecture. I read several guides and got some opinions, but also open-ended questions.

A. Do you think https://github.com/Lewuathe/docker-hadoop-cluster is a good base for my project? I also found the official image , but it is one - node.

B. How will the system requirements change if I want to do this in one container? It would be great because this architecture has to work in different places, so changes can be easily transferred between these locations. Synchronization between these so-called clones would be important.

C. Do you have other ideas, or maybe best practices?

+6

docker cluster-computing hadoop hdfs cloudera

user4725754 Jan 25 '16 at 16:47

source share

3 answers

Bluedata · Answer 1 · 2016-01-26T22:07:15+0000

To solve your C. Question, you can check out the BlueData software platform: http://www.bluedata.com/blog/2015/06/docker-containers-big-data-clusters

It is designed to run Hadoop multi-node clusters in a Docker environment, and there is a free version available for download (you can also run it on an AWS EC2 instance).

Paul verest · Answer 2 · 2016-09-29T14:24:44+0000

As of September 2016, there is no quick answer.

https://github.com/Lewuathe/docker-hadoop-cluster does not seem to be a good start, as it should be universal for your option B.

Watch out for https://github.com/sequenceiq/hadoop-docker and https://github.com/kiwenlau/hadoop-cluster-docker

Justin kestelyn · Answer 3 · 2016-09-29T15:49:32+0000

This work has already been done for you:

https://hub.docker.com/r/cloudera/clusterdock/

It includes a pre-packaged cluster with multiple node clusters, with Cloudera Manager as an additional component for managing clusters, etc.

Multi-node Hadoop Cluster with Docker

More articles: