I read Hadoop in action and found that in Javausing classes MultipleOutputFormatand MultipleOutputswe can reduce the data to a few files, but what I'm not sure how to achieve the same using Python streaming.
Java
MultipleOutputFormat
MultipleOutputs
Python streaming
eg:
/ out1/part-0000 mapper -> reducer \ out2/part-0000
If anyone knows, heard, did similar, please let me know
Dumbo Feathers, java Dumbo ( python, M/R python hadoop), .
, M/R python dumbo , - , . , , , MultipleOutputFormat .
dumbo - typedbytes , , , .