PDF page optimizer library?

Has anyone written a library (or just a program) that optimizes the contents of PDF page streams? I’m talking about things like “delete q ... q blocks that don’t have a common effect”, “mix adjacent BT ... ET blocks”, “keep track of the state of the graphics and delete statements that set something to a value, which he already has, "perhaps even" reordering drawing operations to minimize changes in the state of the graphics when this can be done without changing the appearance of the page. " I am not picky about the implementation language, but open source is very preferable, as I may need to hack it for my specific needs.

Here is a small snippet of an example of what I would like to do. R "grid" graphics + its PDF archive generates ridiculous numbers of meaningless operations, for example:

 1 J 1 jq Q q Q q Q q Q q Q q Q q Q q Q q Q q BT 0.000 0.000 0.000 rg /F2 1 Tf 12.00 0.00 -0.00 12.00 168.43 14.40 Tm [(T) 120 (ask)] TJ ET Q q BT 0.000 0.000 0.000 rg /F2 1 Tf 0.00 12.00 -12.00 0.00 19.42 205.26 Tm [(Quer) -15 (ies per min) 10 (ute)] TJ ET Q q Q q 23.02 489.60 26.53 0.00 re W n Q q Q q 23.02 489.60 26.53 0.00 re W n Q q Q q Q q [...] 

It can be destroyed before

 1 J 1 j BT /F2 1 Tf 12 0 0 12 168.43 14.40 Tm [(T) 120 (ask)] TJ 0 12 -12 0 19.42 205.26 Tm [(Quer) -15 (ies per min) 10 (ute)] TJ ET 

and perhaps even more with more complex use of text operators, which I cannot do in my head.

+4
source share
2 answers

There is a compress tool in Java "Multivalent Tools" that will do this: http://multivalent.sourceforge.net/Tools/pdf/Compress.html

The Compression tool has been removed from the last multi-jar, but you can download an older version from the following location: http://code.google.com/p/pdfsizeopt/downloads/detail?name=Multivalent20060102.jar&can=2&q=

0
source

It looks great, like the PDF output of the iText PdfGraphics2D interface, in the worst case. The usual case is also not so hot, but it is not so bad.

If I'm right, there is still no answer, but you can write it yourself, since you are clearly not afraid of content flows:

 ByteBuffer internalBuf = myPdfContentByte.getInternalBuffer(); String newContents = magic( internalBuf.toString() ); internalBuf.reset(); internalBuf.append( newContents ); 

magic() is foggy foggy, but writing code to remove the "q Q" pairs should be trivial. Yanking clipping regions in which there is nothing inside (linear line W n) should not be much more complicated with a small number of registers.

Getting rid of line / line connection settings (j and J) when they are not used will be more complicated. Same thing with combining text blocks or discarding redundant changes to fill / stroke colors, font and size, etc.

The "difficult use of text operators" will soon begin to look like an optimization of the black magic compiler.

And if it is iText, we will all appreciate it if you share your code. I assure you, we will gladly accept all the data on the output of PdfGraphics2D.

0
source

All Articles