I want to read a list from a file in my stream stream. Here is my simple mapper.py:
#!/usr/bin/env python import sys import json def read_file(): id_list = [] #read ids from a file f = open('../user_ids','r') for line in f: line = line.strip() id_list.append(line) return id_list if __name__ == '__main__': id_list = set(read_file()) # input comes from STDIN (standard input) for line in sys.stdin: # remove leading and trailing whitespace line = line.strip() line = json.loads(line) user_id = line['user']['id'] if str(user_id) in id_list: print '%s\t%s' % (user_id, line)
and here is my .py reducer
#!/usr/bin/env python from operator import itemgetter import sys current_id = None current_list = [] id = None # input comes from STDIN for line in sys.stdin: # remove leading and trailing whitespace line = line.strip() # parse the input we got from mapper.py id, line = line.split('\t', 1) # this IF-switch only works because Hadoop sorts map output # by key (here: word) before it is passed to the reducer if current_id == id: current_list.append(line) else: if current_id: # write result to STDOUT print '%s\t%s' % (current_id, current_list) current_id = id current_list = [line] # do not forget to output the last word if needed! if current_id == id: print '%s\t%s' % (current_id, current_list)
now, to run it, I say:
hadoop jar contrib/streaming/hadoop-streaming-1.1.1.jar -file ./mapper.py \ -mapper ./mapper.py -file ./reducer.py -reducer ./reducer.py \ -input test/input.txt -output test/output -file '../user_ids'
Job launch:
13/11/07 05:04:52 INFO streaming.StreamJob: map 0% reduce 0% 13/11/07 05:05:21 INFO streaming.StreamJob: map 100% reduce 100% 13/11/07 05:05:21 INFO streaming.StreamJob: To kill this job, run:
I get an error message:
job not successful. Error:
I, when I do not read the identifiers from the file .. / user _ids, it does not give me any errors. I think the problem is that it cannot find my file .. / user _id. I also used location in hdfs and still didn't work. Thank you for your help.
python hadoop hadoop-streaming
Elham
source share