Not sure if this is what you are asking for, but the ruby ββcard command reduces scripts with the hadoop command line, it looks something like this:
% hadoop jar $HADOOP_INSTALL/contrib/streaming/hadoop-*-streaming.jar \ -input input/ncdc/sample.txt \ -output output \ -mapper ch02/src/main/ruby/max_temperature_map.rb \ -reducer ch02/src/main/ruby/max_temperature_reduce.rb
You can (and should) use a combiner with large datasets. Add it with the -combiner
option. The combiner output will be fed directly to your cartographer (but does not guarantee how many times this will be triggered, if at all). Otherwise, your input will be split (in accordance with the standard hadoop protocal protocol) and fed directly to your cartographer. Example from O'Reily Hadoop: The Definitive Guide 3rd Edition. It has very good streaming information and a section on streaming using ruby.
dward source share