Auto scalability and migration on the fly

My company is considering using flume to process a fairly large volume of logs. We believe that log processing should be distributed both in volume (scalability) and failure (reliability) considerations, and Flume seems like an obvious choice.

However, we think we are missing something obvious, because we do not see how Flume provides automatic scalability and fault tolerance.

I want to define a stream that says for each line of the log, do thing A, then pass it on and do thing B, then pass it on and do thing C, etc., which seems to go well with Flume. However, I want to be able to define this stream in purely logical terms, and then basically say: "Hey Flame, here are the servers, here is the definition of the stream, go to work!". The servers will die (and ops will restart them), we will add the servers to the cluster and delete the others, and flume will simply direct the work to all nodes that have available capacity.

This is a description of how Hadoop map-reduce implements scalability and fault tolerance, and I assumed Flume to be the same. However, the documentation shows that I need to manually configure which physical servers each logical node runs, and configure specific scripts for switching to a different resource for each node.

Am I right and Flume is not serving our purpose, or am I missing something?

Thank you for your help.

+5
source share
1 answer

Depending on whether you use several wizards, you can program your configuration to follow the switch pattern.

This is described in sufficient detail in the manual: http://archive.cloudera.com/cdh/3/flume/UserGuide/index.html#_automatic_failover_chains

, , Flume .

+6

All Articles