How to specify input file for runner from Python?

Question

How to specify input file for runner from Python?

I am writing an external script to run the mapreduce job through the Python mrjob module on my laptop (and not on Amazon Elastic Compute Cloud or any large cluster).

I read from the mrjob documentation that I should use MRJob.make_runner() to run the mapreduce job from a separate python script as follows.

 mr_job = MRYourJob(args=['-r', 'emr']) with mr_job.make_runner() as runner: ...

However, how can I specify which input file to use? I want to use the file "datalines.txt" in the same directory as my mapreduce script and another python script that runs map reduction. Also, how can I indicate the output?

I could not find a function in the mrjob documentation that allows me to specify these parameters.

+6

python mapreduce mrjob

dangerChihuahua007 24 sept '12 at 16:38

source share

1 answer

jfs · Accepted Answer · 2012-09-24T16:52:42+0000

The Getting Started Guide assumes that input is read from stdin or files provided on the command line:

 mr_job = MRYourJob(args=["datalines.txt"])

How to specify input file for runner from Python?

More articles: