I wrote a program in java using hasoop api. So, the output of this java code is jar .. say foo.jar
To run this jar in hadoop, I do
hadoop jar foo.jar org.foo.bar.MainClass input output
And this will launch a long haop task (say a few minutes).
While the work is done .. hasoop gives me progress .. sort of
Map 0%, Reduce 0%
Map 20%, Reduce 0%
....
etc. .. After the end of work, hadoop spills out a bunch of statistics (for example, the size of the input, splitting, writing, etc.). All this is done from the command line.
Now, what I'm trying to do is ... call this program from python (using simple system execution ..)
But I want ... when I run this code in python. I also want to show some of these statistics ... but not all ..
, , jar python .
. , hadoop ..
Map 0%, Reduce 0%
Map 20%, Reduce 0%
...
..
, , ...
def progress_function(map,reduce):
return sum([map,reduce])/2.0
..
progress so far:0
progress so far:10
and so on..
, . jar java-.. . java- python.., ... python python .