Get the file size of a very large .gz on a 64-bit platform

According to the gz specification, the file size is stored in the last 4 bytes of the .gz file.

I created 2 files with

dd if=/dev/urandom of=500M bs=1024 count=500000 dd if=/dev/urandom of=5G bs=1024 count=5000000 

I gziped them

 gzip 500M 5G 

I checked the last 4 bytes by doing

 tail -c4 500M|od -I (returns 512000000 as expected) tail -c4 5G|od -I (returns 825032704 as not expected) 

Hitting an invisible 32-bit barrier seems to make the value written to ISIZE completely pointless. Which is more annoying than if they used some bit of error.

Does anyone know how to get an uncompressed .gz file from a .gz file without extracting it?

thanks

: http://www.gzip.org/zlib/rfc-gzip.html

edit: if anyone tries this you can use / dev / zero instead of / dev / urandom

+6
64bit 32-bit filesize gz gunzip
source share
3 answers

Not.

The only way to get the exact size of the compressed stream is to actually go and unzip it (even if you write everything in / dev / null and just count the bytes).

It is worth noting that ISIZE is defined as

ISIZE (input size)
It contains the size of the original (uncompressed) input
data modulo 2 ^ 32.

in gzip RFC , so that it does not break the 32-bit barrier, it is expected that you see the behavior.

+8
source share

I have not tried this with a file of the size you specified, but often find an uncompressed .gz file size with

 zcat file.gz | wc -c 

when I do not want to leave an uncompressed file around or bother him again.

Obviously, the data is uncompressed, but then transmitted over wc channels.

In any case, itโ€™s worth a try.

EDIT: When I tried to create a 5G file with data from / dev / random, it created a 5G file of size 5120000000, although my file manager reported it as 4.8G

Then I compressed it using gzip 5G , the results of 5G.gz were the same size (not much random data compression).

Then zcat 5G.gz | wc -c zcat 5G.gz | wc -c reported the same size as the source file: 5120000000 bytes. In any case, my suggestion seemed to work for this test.

thanks for waiting

+2
source share

gzip has the -l option:

  -l --list For each compressed file, list the following fields: compressed size: size of the compressed file uncompressed size: size of the uncompressed file ratio: compression ratio (0.0% if unknown) uncompressed_name: name of the uncompressed file The uncompressed size is given as -1 for files not in gzip format, such as compressed .Z files. To get the uncompressed size for such a file, you can use: zcat file.Z | wc -c In combination with the --verbose option, the following fields are also displayed: method: compression method crc: the 32-bit CRC of the uncompressed data date & time: time stamp for the uncompressed file The compression methods currently supported are deflate, compress, lzh (SCO compress -H) and pack. The crc is given as ffffffff for a file not in gzip format. With --name, the uncompressed name, date and time are those stored within the compress file if present. With --verbose, the size totals and compression ratio for all files is also displayed, unless some sizes are unknown. With --quiet, the title and totals lines are not displayed. 
0
source share

All Articles