Ubuntu Cluster Management

I am trying to find a solution for managing a set of Linux machines (OS: Ubuntu, ~ 40 nodes. Same hardware). These machines must be images of each other, software installed in one must be installed in the other. My software requirements are hadoop, R, and servicemix. R on all machines also needs to be synchronized (a package installed on one must be available in all the others)

One of the solutions I'm using now is to use NFS and pssh. I hope there will be a better / easier solution that will make my life easier. Any suggestion is appreciated.

+6
r ubuntu cluster-computing hadoop
source share
2 answers

Two popular options: Puppet from Puppet Labs and Chef from OpsCode.

Another potential mechanism is to create a new metapackage that Requires: packages you want to install on all machines. When you change the meta package, apt-get update && apt-get -u dist-upgrade will install the new package on all your systems at once.

The meta-package approach may have less work to do at the initial stage of setup and use, but Puppet or Chef can provide a better return on investment in the long run, as they can manage much more than just installing packages.

+5
source share

I used the low-tech apporach in the past to do this by simply sharing (at least parts) of /usr/local/ to save the R shared library to /usr/local/lib/R/site-library/ . I think this could work for your Hadoop installation as well.

I tried to save everything else in Debian / Ubuntu packages and saved all the nodes. Local R and Ubuntu package repositories (for locally created packages) may also help, but they work a little more.

+3
source share

All Articles