Pdftk will not decompress data streams

Question

Pdftk will not decompress data streams

I am trying to work with pdftk to check information from compressed PDF streams created by Nitro Reader, but pdftk will not deflate streams. It does not produce errors, but it seems to do nothing but reorder PDF objects. Here is a minimal example of one of these pdf files.

pdftk test.pdf output test-d.pdf uncompress

When I try to use pdftk for other PDF files, this works fine. If I manually extract the data streams and unzip them using zlib in Python, they will decompress correctly. Also, if I open the PDF in Adobe Reader and reload, pdftk works fine in the resulting pdf file.

I manually examined the Nitro PDF file and it seems to be a valid pdf file. I am very confused as to what is going on here.

As a background for this problem, I have hundreds of these PDF files, and I'm trying to search for specific keywords that I could do if I could automate decompression.

pdftk version 1.45
Windows 7 Home Premium SP1
Nitro Reader 2 Version 2.5.0.36

Thanks James

+8

pdf pdftk

James duvall Feb 25 '13 at 0:03

source share

2 answers

If you are not tied to pdftk , you can use qpdf . For example, you can use:

 $ qpdf --stream-data=uncompress input.pdf output.pdf

What is it worth, if there are drops, they can still be displayed as binary. Although, the rest of the stream will be uncompressed (either with pdftk or qpdf ). qpdf allows qpdf to unpack all or only streams.

From the qpdf :

When the value -stream-data = uncompress is specified, qpdf will try to remove any non-lost filters that it supports. This includes / FlateDecode, / LZWDecode, / ASCII85Decode and / ASCIIHexDecode. This can be very useful for checking the contents of various streams.

The same thing can happen with pdftk .

+7

gpoo Mar 22 '13 at 22:59

source share

James duvall · Accepted Answer · 2013-08-26T06:40:34+0000

I got an answer from this question from the developer. This turned out to be an error in the way pdftk handles the string /DecodeParms [null] .

If the decoding options are zero, the writer can simply omit the /DecodeParms line, but a compatible reader should understand this anyway. I tried the new version of pdftk and the problem seems to be resolved.

Pdftk will not decompress data streams

More articles: