How to inflate a partial zlib file

I have the first adjacent 2 / 3rds file that was compressed using the zlib deflate () function. The last 1/3 was lost during transmission. The original uncompressed file was 600 KB.

Deflate was called several times by the transmitter, interrupting the source file to 2KB block sizes and passing Z_NO_FLUSH to the end of the file when transmitting Z_FINISH. The resulting full compressed file was transferred, but partially lost, as described.

Is it possible to restore part of the original file? If so, any suggestions on how?

I use both the regular C ZLIB implementation and / or the ZLIB implementation in Python 2.7.

+6
source share
3 answers

Although I don't know python, I managed to get this to work:

#!/usr/bin/python import sys import zlib f = open(sys.argv[1], "rb") g = open(sys.argv[2], "wb") z = zlib.decompressobj() while True: buf = z.unconsumed_tail if buf == "": buf = f.read(8192) if buf == "": break got = z.decompress(buf) if got == "": break g.write(got) 

This should extract everything that is available from your partial zlib file.

+10
source

Update: as @Mark Adler pointed out ; partial content can be unzipped using zlib.decompressobj :

 >>> decompressor = zlib.decompressobj() >>> decompressor.decompress(part) "let compress some t" 

where part is defined below.

--- The following is an old comment:

By default, zlib does not handle partial content in Python.

It works:

 >>> compressed = "let compress some text".encode('zip') >>> compressed 'x\x9c\xcbI-Q/VH\xce\xcf-(J-.V(\xce\xcfMU(I\xad(\x01\x00pX\t%' >>> compressed.decode('zip') "let compress some text" 

This does not work if we truncate it:

 >>> part = compressed[:3*len(compressed)/4] >>> part.decode('zip') Traceback (most recent call last): File "<input>", line 1, in <module> File ".../lib/python2.7/encodings/zlib_codec.py", lin e 43, in zlib_decode output = zlib.decompress(input) error: Error -5 while decompressing data: incomplete or truncated stream 

The same if we explicitly use zlib :

 >>> import zlib >>> zlib.decompress(compressed) "let compress some text" >>> zlib.decompress(part) Traceback (most recent call last): File "<input>", line 1, in <module> error: Error -5 while decompressing data: incomplete or truncated stream 
+2
source

The following seems feasible in theory, but requires processing with low-level zlib routines to work. In http://www.zlib.net/zlib_how.html we will find an example of the zpipe.c program and in the line by the description of the line:

CHUNK is simply the size of a buffer for feeding data and pulling data from zlib routines. Larger buffer sizes will be more efficient, especially for inflate (). If memory is available, buffer sizes of the order of 128 KB or 256 KB should be used.

 #define CHUNK 16384 ... 

Here is my suggestion: you set the buffer very small - if supported, perhaps even in one byte. This way you decompose as much as possible to the inevitable Z_BUF_ERROR . At this point, the collected data is usually discarded (look for premature deflate_end calls that "clear" behind your back), but in your case, you can simply transfer the file and close it when you find that you cannot continue.

The last few bytes of output may contain trash if the erroneous "final" character has received decoding, or zlib may abort prematurely rather than output a partial character. But you know that your data will in any case be incomplete, so this should not be a problem.

0
source

All Articles