Hadoop -pseudo distributed mode: input path does not exist

I am new to Hadoop .. I just launched the application for my application offline. Everything went perfectly. Now I decided to move it to pseudo-distributed mode. I made configuration changes as mentioned. Fragments of my xml files are displayed:

my core-site.xml is as follows:

<name>fs.default.name</name> <value>hdfs://localhost/</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/tmp/hadoop-onur</value> <description>A base for other temporary directories.</description> </property> 

my hdfs-site.xml

 <property> <name>dfs.replication</name> <value>1</value> </property> 

and my mapred.xml file

 <property> <name>mapred.job.tracker</name> <value>localhost:8021</value> </property> 

I ran the scripts for start-dfs.sh and start-mapred.sh and it started fine

 root@vissu-desktop :/home/vissu/Raveesh/Hadoop# start-dfs.sh starting namenode, logging to /home/vissu/Raveesh/Hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-namenode-vissu-desktop.out localhost: starting datanode, logging to /home/vissu/Raveesh/Hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-datanode-vissu-desktop.out localhost: starting secondarynamenode, logging to /home/vissu/Raveesh/Hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-secondarynamenode-vissu-desktop.out root@vissu-desktop :/home/vissu/Raveesh/Hadoop# start-mapred.sh starting jobtracker, logging to /home/vissu/Raveesh/Hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-jobtracker-vissu-desktop.out localhost: starting tasktracker, logging to /home/vissu/Raveesh/Hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-tasktracker-vissu-desktop.out root@vissu-desktop :/home/vissu/Raveesh/Hadoop# 

Now I tried to run my application: But I received the following error.

 root@vissu-desktop :/home/vissu/Raveesh/Hadoop/hadoop-0.20.2# hadoop jar ResultAgg_plainjar.jar ProcessInputFile /home/vissu/Raveesh/VotingConfiguration/sample.txt ARG 0 obtained = ProcessInputFile 12/07/17 17:43:33 INFO preprocessing.ProcessInputFile: Modified File Name is /home/vissu/Raveesh/Hadoop/hadoop-0.20.2/sample.txt_modf Going to process map reduce jobs 12/07/17 17:43:33 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 12/07/17 17:43:34 ERROR preprocessing.ProcessInputFile: Input path does not exist: hdfs://localhost/home/vissu/Raveesh/Hadoop/hadoop-0.20.2/sample.txt_modf root@vissu-desktop :/home/vissu/Raveesh/Hadoop/hadoop-0.20.2# 

The application initially takes the file from the path, then modifies it and creates sample.txt_modf, and this file should be used by the map reduction frame. When working offline, I gave the absolute path, and therefore everything was in order. But I can not understand what the path should point to in the Path api for hadoop. If I give the file, it adds hdfs: // localhost /. Therefore, I am not sure how to specify the path in pseudo-distributed mode .. should I just make sure that the modified file is created in this place.

My request is on how to specify the path.

A fragment containing a path

  KeyValueTextInputFormat.addInputPath(conf, new Path(System.getProperty("user.dir")+File.separator+inputFileofhits.getName())); FileOutputFormat.setOutputPath( conf, new Path(ProcessInputFile.resultAggProps .getProperty("OUTPUT_DIRECTORY"))); 

thanks

+4
source share
1 answer

Does this file exist in HDFS? It looks like you provided a local file path (user directories in HDFS are usually rooted in / user, not / home.

You can check the file in HDFS by typing:

 #> hadoop fs -ls hdfs://localhost/home/vissu/Raveesh/Hadoop/hadoop-0.20.2/sample.txt_modf 

If this does not return anything, that is, the file is not in HDFS, you can copy it to HDFS again using the hadoop fs command:

 #> hadoop fs -put /home/vissu/Raveesh/Hadoop/hadoop-0.20.2/sample.txt_modf hdfs://localhost/user/vissu/Raveesh/Hadoop/hadoop-0.20.2/sample.txt_modf 

Note that the path in HDFS is based on / user, not / home.

+5
source

All Articles