Hadoop vs. Disco versus. Condor?

I am trying to find a tool that will manage many tasks on 100 machines in a cluster (send tasks to the machines, make sure that the tasks are performed, etc.).

Which tool will be easier to install / manage:

(1) Hadoop?

(2) Disco?

(3) Condor?

Ideally, I am looking for a solution that would be as simple as possible, but reliable. Python integration Python also a plus.

+4
source share
2 answers

I am not familiar with Disco and Condor, but I can answer regarding Hadoop:

Hadoop pros:

  • Reliable and proven - perhaps more than anything. Used by many organizations (including the one for which I work) to run clusters of 100 nodes or more.
  • Large ecosystem = support + many subprojects to make life easier (e.g. Pig, Hive).
  • Python support should be possible with the MR streaming function, or maybe Jython?

Hadoop Cons:

  • Neither simple nor elegant (imho). You will have to spend time learning.
+3
source

Have you considered the Sun Grid Engine? http://wikis.sun.com/display/GridEngine/Home .

0
source

All Articles