Why search from the end of the file allowed for BZip2 files and not Gzip files?

Question

Why search from the end of the file allowed for BZip2 files and not Gzip files?

Question

I am parsing large compressed files in Python 2.7.6 and would like to know the size of the uncompressed file before running. I am trying to use the second technique presented in this SO answer . It works with bzip2 format files, but not with gzip formatted files. What is different from the two compression algorithms that cause this?

Code example

This code snippet demonstrates behavior, assuming you have "test.bz2" and "test.gz" in the current working directory:

import os
import bz2
import gzip

bz = bz2.BZ2File('test.bz2', mode='r')
bz.seek(0, os.SEEK_END)
bz.close()

gz = gzip.GzipFile('test.gz', mode='r')
gz.seek(0, os.SEEK_END)
gz.close()

The following trace is displayed:

Traceback ( ):
"zip_test.py", 10,
gn.seek(0, os.SEEK_END) "/usr/lib64/python2.6/gzip.py", 420, ValueError ( " " )
ValueError:

*.bz2, *.gz?

+4

python gzip bzip2

skrrgwasme 08 . '14 22:51

1

Bartosz · Accepted Answer · 2014-09-08T23:33:16+0000

, gzip - , , . , . , gzip.py , , , , , .

, bzip2 , .

gzipped , , , . , , .

Why search from the end of the file allowed for BZip2 files and not Gzip files?

Question

Code example

More articles: