Combining PDFs in Haskell

The Mac preview application allows you to combine multiple PDF files, although the functionality is rather obscure. I am writing a utility in Haskell that should perform a similar task, that is, combine an arbitrary number of PDF files into one new file.

Does anyone have a suggestion on where to start? Obviously, if Hackage has a library that will do most of the work out of the box, which would be ideal, but if not, some pointers on where to start will be greatly appreciated.

+4
source share
2 answers

I am working on a pdf library that supports parsing and authoring. These are low-level tools of a higher level in the task list yet (because it is difficult to develop a good high-level API).

Here is an example of unpacking and decrypting a PDF file. PDF merging is easy to implement, but you should be familiar with internal PDFs.

ADDED: I am creating a basic example of merging PDF files in Haskell. Only 150 lines of code, but it lacks several functions (see Comments at the top of the file). They are easy to add, so let me know if you are interested.

+4
source

The PDF file format is not so complicated. Adobe has an official specification document for something. Essentially, a PDF file contains a set of numbered "objects." You will need to get all the objects from each PDF file, renumber them to be unique, and then you need to bother with the page index so that all pages actually display.

Hackage seems to have several packages for writing PDF files, but I donโ€™t see them for reading. You can see the source code for pdfsplit for ideas. Also HPDF .

+2
source

All Articles