The problem with this idea is that Hadoop does not have the concept of "distributed memory". If you want to get the result "in memory", the next question should be "what kind of memory is the machine?" If you really want to access it, you will have to write your own output format, and then use either the existing structure to exchange memory on different machines, or write your own again.
My suggestion was to simply write HDFS as usual, and then for non-MapReduce business logic, just start by reading data from HDFS using the FileSystem API, i.e.:
FileSystem fs = new JobClient(conf).getFs(); Path outputPath = new Path("/foo/bar"); FSDataInputStream in = fs.open(outputPath); // read data and store in memory fs.delete(outputPath, true);
Of course, he does some unnecessary reads and writes to disk, but if your data is small enough to fit in memory, why are you worried about it anyway? I would be surprised if this were a serious bottleneck.
Joe k source share