How do I create clear-text PDF examples that work in the document viewer?

I just found the Adobe Forums message : Simple Text String An example in the specification is broken. , so I found it interesting to find the text source code for PDF Examples.

So, through this post, I eventually discovered:

PDF Specification 1.7 on page 699 application "_Annex H (informative) Example PDF files"; and from there I wanted to try the "H.3 Simple Text String Example" ("Classic Hello World").

So, I tried to save this as hello.pdf (_except note when copying from PDF32000_2008.pdf, you can get " %PDF-1. 4 " - that is, the space inserted after 1. which needs to be deleted_)

 %PDF-1.4 1 0 obj << /Type /Catalog /Outlines 2 0 R /Pages 3 0 R >> endobj 2 0 obj << /Type /Outlines /Count 0 >> endobj 3 0 obj << /Type /Pages /Kids [ 4 0 R ] /Count 1 >> endobj 4 0 obj << /Type /Page /Parent 3 0 R /MediaBox [ 0 0 612 792 ] /Contents 5 0 R /Resources << /ProcSet 6 0 R /Font << /F1 7 0 R >> >> >> endobj 5 0 obj << /Length 73 >> stream BT /F1 24 Tf 100 100 Td ( Hello World ) Tj ET endstream endobj 

... and I'm trying to open it:

 evince hello.pdf 

... however, evince cannot open it: "Unable to open the document / PDF document is damaged"; and:

 Error: PDF file is damaged - attempting to reconstruct xref table... Error: Couldn't find trailer dictionary Error: Couldn't read xref table 

I also check qpdf :

 $ qpdf --check hello.pdf WARNING: hello.pdf: file is damaged WARNING: hello.pdf: can't find startxref WARNING: hello.pdf: Attempting to reconstruct cross-reference table hello.pdf: unable to find trailer dictionary while recovering damaged file 

Where am I mistaken?

Thanks a lot in advance for any answers,
Hurrah!

+3
source share
2 answers

You should add the (syntactically correct) xref and trailer section to the end of the file. This means: each object in your PDF needs one row in the xref table, even if the byte offset is incorrect. Then Ghostscript, pdftk or qpdf can restore the correct xref and display the file:

 [...] endobj xref 0 8 0000000000 65535 f 0000000010 00000 n 0000000020 00000 n 0000000030 00000 n 0000000040 00000 n 0000000050 00000 n 0000000060 00000 n 0000000070 00000 n trailer <</Size 8/Root 1 0 R>> startxref 555 %%EOF 
+2
source

Heck, I copied only part of the code; OP code - the one on pg 701 - then there is a footer that confused me; otherwise, the code will continue on page 702 :/

(EDIT: also under Introduction to PDF - GNUpdf ( archive ) for a similar, more detailed example)

So here is the complete code:

 %PDF-1.4 1 0 obj << /Type /Catalog /Outlines 2 0 R /Pages 3 0 R >> endobj 2 0 obj << /Type /Outlines /Count 0 >> endobj 3 0 obj << /Type /Pages /Kids [ 4 0 R ] /Count 1 >> endobj 4 0 obj << /Type /Page /Parent 3 0 R /MediaBox [ 0 0 612 792 ] /Contents 5 0 R /Resources << /ProcSet 6 0 R /Font << /F1 7 0 R >> >> >> endobj 5 0 obj << /Length 73 >> stream BT /F1 24 Tf 100 100 Td ( Hello World ) Tj ET endstream endobj 6 0 obj [ /PDF /Text ] endobj 7 0 obj << /Type /Font /Subtype /Type1 /Name /F1 /BaseFont /Helvetica /Encoding /MacRomanEncoding >> endobj xref 0 8 0000000000 65535 f 0000000009 00000 n 0000000074 00000 n 0000000120 00000 n 0000000179 00000 n 0000000364 00000 n 0000000466 00000 n 0000000496 00000 n trailer << /Size 8 /Root 1 0 R >> startxref 625 %%EOF 

In fact, as the error messages said, the help section was missing!

However, this is not the end - as long as this document opens in evince , evince will still complain:

 $ evince hello.pdf Error: PDF file is damaged - attempting to reconstruct xref table... 

... and so will qpdf :

 $ qpdf --check hello.pdf WARNING: hello.pdf: file is damaged WARNING: hello.pdf (file position 625): xref not found WARNING: hello.pdf: Attempting to reconstruct cross-reference table checking hello.pdf PDF Version: 1.4 File is not encrypted File is not linearized WARNING: hello.pdf (object 5 0, file position 436): attempting to recover stream length 

So, to get the right example, like the Adobe Forums: a simple text string example in the specification is broken. indicates that the xref table should be reconstructed (have the correct byte offsets).

And for this, we can use pdftk in " Repair damaged PDF table and XREF stream length (if possible) ":

 $ pdftk hello.pdf output hello_repair.pdf 

... and now hello_repair.pdf opens in evince without any problems - and qpdf reports:

 $ qpdf --check hello_repair.pdf checking hello_repair.pdf PDF Version: 1.4 File is not encrypted File is not linearized No errors found 

Ok, hope this helps someone
Hurrah!

+1
source

Source: https://habr.com/ru/post/1416613/


All Articles