Combining PDF with GhostScript: Using Original Bookmarks with Corrected Page Numbers

I use

gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=book.pdf -f front-matter.pdf fulltext-0.pdf fulltext-1.pdf back-matter.pdf 

to create a single PDF document from a series of PDF documents. I was going to include a newly prepared content table and enable it using the mechanism. Then I notice that the source files already have bookmarks in them - however, they refer to the original page numbers, and not to those contained in the merged document.

I am looking for two possible solutions. Delete the original bookmarks or use the original bookmarks, but somehow update their links to the pages ...

+7
source share
2 answers

As often happens, someone went the same way to you ...

disaster deployment has developed a solution to this very problem. His python script pdf-merge.py first calls pdftk with its dump_data switch to get all the information in pdfmark. Then it keeps track of the total number of pages for each merged document and allows the math to shift the pointer to the page number in the pdfmark instruction by the total amount of pages of all PDF documents included before the current PDF document. Thus, it is close, but does not match KenS's two-pass approach. First, it detects bookmarks using pdftk, and then creates a new bookmark file with the correct page numbers. It can also convert the original pdfmark statement (which is usually saved by gs to noop). I will not pretend to understand how this last part worked ...

However, the script does everything I need, including the ability to customize the bookmark file before final writing. Very neat and hat tip to Trevor King.

+4
source

In general, the pdfwrite file does not know that you are adding files, so it saves bookmarks and other metadata data on the assumption that you want them in the output.

However, when you merge PDF files, saving information will not work, because the page numbers for the second and subsequent files will be incorrect.

So, you need an approach with two passes, first merge all the files, drop the bookmarks, then "convert" the combined file and add pdfmarks to set the correct bookmarks.

There is currently no option (with pdfwrite) to not save bookmarks. To do this, you will need to modify the PostScript files for the Ghostscript PDF interpreter. You can try setting -dDOPDFMARKS = false, but I doubt it will work.

+2
source

All Articles