Direct data stream in hdfs without copying

I am looking for various options with which I can write data directly to hdfs using python without saving to a local node and then using copyfromlocal.

I would like to use an hdfs file similar to a local file, and use the write method with a string as an argument, something like the following:

hdfs_file = hdfs.create("file_tmp") hdfs_file.write("Hello world\n") 

Is there something similar to the use case described above?

+7
source share
1 answer

I'm not sure about the python hdfs library, but you can always pass fs put through the hasoop command and designate copying from stdin using '-' as the original file name:

 hadoop fs -put - /path/to/file/in/hdfs.txt 
+9
source

All Articles