Unpack bz2 url without temporary file in python

I want to unzip the data from bz2 url directly to the target file. Here is the code:

filename = 'temp.file'  
req = urllib2.urlopen('http://example.com/file.bz2')
CHUNK = 16 * 1024
with open(filename, 'wb') as fp:
  while True:
    chunk = req.read(CHUNK)
    if not chunk: break
    fp.write(bz2.decompress(chunk)) 
fp.close()

Error in bz2.decompress (chunk) - ValueError: could not find end of stream

+4
source share
2 answers

Use bz2.BZ2Decompressorfor sequential decompression:

filename = 'temp.file'  
req = urllib2.urlopen('http://example.com/file.bz2')
CHUNK = 16 * 1024

decompressor = bz2.BZ2Decompressor()
with open(filename, 'wb') as fp:
    while True:
        chunk = req.read(CHUNK)
        if not chunk:
            break
        fp.write(decompressor.decompress(chunk))
req.close()

By the way, you do not need to call fp.close()while you use the operator with.

+3
source

You must use BZ2Decompressorone that supports incremental decompression. see https://docs.python.org/2/library/bz2.html#bz2.BZ2Decompressor

I did not debug this, but it should work as follows:

filename = 'temp.file'  
req = urllib2.urlopen('http://example.com/file.bz2')
CHUNK = 16 * 1024

decompressor = bz.BZ2Decompressor()

with open(filename, 'wb') as fp:
  while True:
    chunk = req.read(CHUNK)
    if not chunk: break

    decomp = decompressor.decompress(chunk)
    if decomp:
        fp.write(decomp) 
+2
source

All Articles