How to unzip a file in hadoop?

I tried to unzip the zip file stored in the Hadoop file system and save it back to the hasoop file system. I tried the following commands, but none of them worked.

hadoop fs -cat /tmp/test.zip|gzip -d|hadoop fs -put - /tmp/ hadoop fs -cat /tmp/test.zip|gzip -d|hadoop fs -put - /tmp hadoop fs -cat /tmp/test.zip|gzip -d|hadoop put - /tmp/ hadoop fs -cat /tmp/test.zip|gzip -d|hadoop put - /tmp 

I get errors like gzip: stdin has more than one entry--rest ignored , cat: Unable to write to output stream. , Error: Could not find or load main class put on the terminal when I run these commands. Any help?

Change 1 . I do not have access to the user interface. Thus, only command lines are allowed. The unzip / gzip utilities are installed on my hadoop machine. I am using Hadoop 2.4.0 version Hadoop 2.4.0 .

+8
hadoop
source share
4 answers

I use most of the time to mount hdfs fuses for this

So you could just do

 $ cd /hdfs_mount/somewhere/ $ unzip file_in_hdfs.zip 

http://www.cloudera.com/content/www/en-us/documentation/archive/cdh/4-x/4-7-1/CDH4-Installation-Guide/cdh4ig_topic_28.html

Editing 1/30/16: If you use access control lists in hdfs format: in some cases, fusible inserts do not correspond to hdfs ACLs, so you can perform file operations that are allowed by basic unix access privileges. See https://issues.apache.org/jira/browse/HDFS-6255 , at the bottom of the comment I recently asked to open.

+2
source share

To unzip a file with the extension gzipped (or bzipped), I use the following

 hdfs dfs -cat /data/<data.gz> | gzip -d | hdfs dfs -put - /data/ 

If the file is located on your local drive,

 zcat <infile> | hdfs dfs -put - /data/ 
+2
source share
  • gzip use -c to read data from stdin
  • hasoop fs -put does not support reading data from stdin

    I tried a lot of things and helped. I cannot find support for the hadoop.So postcode, it left me no choice but to upload the hadoop file to local fs, unzip it and upload it to hdfs again.

0
source share

To transfer data through a pipe to hadoop, you need to use the hdfs command.

 cat mydatafile | hdfs dfs -put - /MY/HADOOP/FILE/PATH/FILENAME.EXTENSION 
0
source share

All Articles