What is the HDFS location on Hadoop?

I am trying to run the WordCount example in Hadoop after the following online tutorials. However, it is not clear to me how the file is copied from our local file system to HDFS when the following command is executed.

hadoop fs -copyFromLocal /host/tut/python-tutorial.pdf /usr/local/myhadoop-tmp/ 

When I executed the following command, I do not see my python-tutorial.pdf listed here on HDFS.

 hadoop fs -ls 

It bothers me. I already specified the directory "myhadoop-tmp" in the core-site.xml file. I thought this directory would become an HDFS directory for storing all input files.

 core-site.xml ============= <property> <name>hadoop.tmp.dir</name> <value>/usr/local/myhadoop-tmp</value> <description>A base for other temporary directories.</description> </property> 

If this is not the case when HDFS is on my machine? What configuration defines the HDFS directory and where does the input file go when we copy it from the local file system to HDFS?

+7
java hadoop
source share
1 answer

This is set in the dfs.datanode.data.dir property, which by default is file://${hadoop.tmp.dir}/dfs/data (see details here ).

However, in your case, the problem is that you are not using the full path in HDFS. Instead, run:

 hadoop fs -ls /usr/local/myhadoop-tmp/ 

Note that it looks like you are also confusing the path in HDFS with the path in the local file system. In HDFS, your file is located in /usr/local/myhadoop-tmp/ . On your local system (and given the configuration settings) it is under /usr/local/myhadoop-tmp/dfs/data/ ; there is a directory structure and a naming convention defined by HDFS that does not depend on any path in HDFS that you decide to use. In addition, it will not have the same name, since it is divided into blocks, and a unique identifier is assigned to each block; the block name is similar to blk_1073741826 .

In conclusion: the local path used by this datanode does not match the paths you use in HDFS. You can go to your local directory looking for files, but you shouldn't, because you can ruin the management of HDFS metadata. Just use the hadoop command line tools to copy / move / read files in HDFS using any logical path (in HDFS) that you want to use. These paths in HDFS need not be tied to the paths that you used for local data storage (there is no reason or benefit to this).

+6
source share

All Articles