I am in the planning phase of a multi-node Hadoop cluster in a Docker environment. Therefore, it should be based on a simple, easy-to-use virtualized system. The current architecture (regarding documentation) contains 1 main and 3 subordinate nodes. This host computer uses the HDFS and KVM file system for virtualization. The entire cloud is managed by the Cloudera Manager . There are several Hadoop modules installed on this cluster. There is also a NodeJS data download service. This time I have to create a Docker architecture. I read several guides and got some opinions, but also open-ended questions.
A. Do you think https://github.com/Lewuathe/docker-hadoop-cluster is a good base for my project? I also found the official image , but it is one - node.
B. How will the system requirements change if I want to do this in one container? It would be great because this architecture has to work in different places, so changes can be easily transferred between these locations. Synchronization between these so-called clones would be important.
C. Do you have other ideas, or maybe best practices?
user4725754
source share