Background
My employer is gradually changing our resource-intensive ETL processing logic and backend from MySQL to Hadoop (dfs and hive). At the moment, it is still somewhat small and manageable (20 TB per 10 nodes), but we intend to gradually increase the cluster size.
Now that hasoop is moving into production, it is becoming a bigger problem of batch planning and cluster sharing between user hive requests, M / R clock processes, and I consider, ultimately, using hbase. The fear is that a naive request will be made by a user who can potentially work for an unreasonable period of time (say, 4 hours), clogging the task queue and creating potential instability of the infrastructure load.
Question
Another section of my company has already been burnt by the immaturity of Flume, so my question is: how stable are two well-known schedulers (Capacity and Fair) and in addition to being used in their sponsoring companies (Yahoo and Facebook), are they used in other places?
Edit: Background Information
http://www.cloudera.com/blog/2008/11/job-scheduling-in-hadoop/
http://hadoop.apache.org/mapreduce/docs/r0.21.0/fair_scheduler.html
http://hadoop.apache.org/mapreduce/docs/r0.21.0/capacity_scheduler.html
David source share