I have a mapreduce task written in Python. The program was successfully tested in linux env, but failed when I run it under Hadoop.
Here is the job command:
hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-0.20.1+169.127-streaming.jar \ -input /data/omni/20110115/exp6-10122 -output /home/yan/visitorpy.out \ -mapper SessionMap.py -reducer SessionRed.py -file SessionMap.py \ -file SessionRed.py
* .Py session mode is 755, and #!/usr/bin/env python
is the top line in the * .py file. Mapper.py:
#!/usr/bin/env python import sys for line in sys.stdin: val=line.split("\t") (visidH,visidL,sessionID)=(val[4],val[5],val[108]) print "%s%s\t%s" % (visidH,visidL,sessionID)
Error from the log:
java.io.IOException: Broken pipe at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:260) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109) at java.io.DataOutputStream.write(DataOutputStream.java:90) at org.apache.hadoop.streaming.io.TextInputWriter.writeUTF8(TextInputWriter.java:72) at org.apache.hadoop.streaming.io.TextInputWriter.writeValue(TextInputWriter.java:51) at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:110) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170) at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:126) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170)
source share