Python Streaming: how to reduce to multiple exits? (Maybe using Java)

I read Hadoop in action and found that in Javausing classes MultipleOutputFormatand MultipleOutputswe can reduce the data to a few files, but what I'm not sure how to achieve the same using Python streaming.

eg:

                  / out1/part-0000
mapper -> reducer   
                  \ out2/part-0000

If anyone knows, heard, did similar, please let me know

+5
source share
1 answer

Dumbo Feathers, java Dumbo ( python, M/R python hadoop), .

, M/R python dumbo , - , . , , , MultipleOutputFormat .

dumbo - typedbytes , , , .

+2
source

All Articles