For text files and HDFS only, use the following command as the source and destination:
hadoop fs -cat /input_hdfs_dir/* | hadoop fs -put - /output_hdfs_file
This will concatenate all the files in input_hdfs_dir and output the output back to HDFS in output_hdfs_file . Keep in mind that all data will be returned to the local system and then uploaded to hdf again, although temporary files are not created, and this happens on the fly using UNIX pe.
In addition, this will not work with non-text files such as Avro, ORC, etc.
For binaries, you can do something like this (if you have Hive tables displayed in directories):
insert overwrite table tbl select * from tbl
Depending on your configuration, this may also create more than files. To create a single file, either set the number of reducers to 1 explicitly using mapreduce.job.reduces=1 , or set the hive property to hive.merge.mapredfiles=true .
Gaurav Kumar Sep 16 '15 at 15:46 2015-09-16 15:46
source share