Should %% EOF in a PDF file appear in the last 1024 bytes of the file?

According to the QPDF source I read, it had a quote about pdf files:

// PDF spec says %%EOF must be found within the last 1024 bytes of // the file. We add an extra 30 characters to leave room for the // startxref stuff. 

However, I cannot find any information about this in the PDF 1.7 specification. I found a couple of places on the Internet that also mentioned this.

My question is: is this true, and if so, where does it say that %%EOF will be in the last 1024 bytes?

+7
source share
2 answers

The source code does say that in libqpdf/QPDF.cc , but ISO 32000-1: 2008 (PDF 1.7) has this to say about the file trader:

7.5.5. File trailer

A PDF file trailer allows an obedient reader to quickly find a cross-reference table and some special objects. Relevant readers should read the PDF file from its end. The last line of the file should contain only the end-of-file marker, %% EOF .

So, if you follow the standard, it is even more limiting than what you declare.


Returning to the Adobe 1.3 specification, in Appendix H (Implementation Notes) you will find this small snippet about the properties of the Acrobat viewer (and not the file format):

3.4.4, "File Trailer"

Acrobat viewers only need the %% EOF marker to appear somewhere within the last 1024 bytes of the file.

In other words, he says that the viewer (Adobe implementation) is a little more relaxed in what he will accept. However, the specification itself still claims that %%EOF should be on its own, in the last line.

This note still exists in Adobe versions of file format documents up to 1.7. However, it has been removed from the ISO version because, correctly, the ISO does not care about any specific product implementations if they comply with the standard as written.

Adobe documents can be found here , they also have the right to distribute a (slightly modified) version of the ISO 32000 standard.

+10
source

You also need to know the (standard) function that PDF documents can use: it is called incremental updating.

If the document has been updated, a new modified version can be created by saving the original data (including the last line of %%EOF ) and adding any modified or added objects, supplemented by a new end of the file with additional sections xref and trailer plus additional final %%EOF .

There may be several incremental updates in the PDF.

Thus, the first %%EOF can be displayed long before the "last 1024 bytes of the file."

The advantage (or disadvantage - depends on your specific point of view) of this "incremental update" function: you can restore the previous version of the PDF file by simply deleting all the lines that follow the second, but the last %%EOF (you can continue this process until will not reach the first version of the file).

There is also a pdfresurrect command line pdfresurrect

  • which can report the number of incremental updates that have been applied to the PDF,
  • which can retrieve previous versions, and
  • which can smooth the story and create a new PDF file that contains only the latest version.

Is this "incremental update" feature much useful in real PDF files?

First : it is used whenever a PDF has a digital / electronic signature.

Second : this is the standard way for Adobe Acrobat to save a PDF file whenever you just click the Save button. (If you want to avoid incremental document updates, use Save as... instead!). One of the few exceptions when a simple Save click will no longer incrementally update the file with the latest versions of Acrobat, but generates a completely new PDF file after you delete the full pages (it seems that too many Adobe clients complained about previous versions, because any incremental the update will increase the file size - it was too annoying that deleting the pages gave them large PDF files and did not actually delete the pages either).

Therefore, be careful with information leaks that occur unintentionally and accidentally, because you are not aware of the action of Acrobat described in the second paragraph above.


Update

Recently, I created a PDF file with manual encoding for a PDF workshop (video) at the TROOPERS15 conference , which can be used to study the details of this function:

  • 114_incrementally-updated.pdf (8.3 kB on GitHub)
    (I would recommend backing up the file after downloading it. Then just delete each line after the first %%EOF , save the file and look at the now visible content ...)
+2
source

All Articles