Fast PDF Splitter Library

pyPdf is a great library for splitting, merging PDF files. I use it to split PDF documents into single-page documents. pyPdf is pure python and spends quite a bit of time in the _sweepIndirectReferences () method of the PdfFileWriter object while saving the extracted page. I need something with better performance. I tried using multithreading, but since most of the time is spent on python code, there was no increase in speed due to the GIL (it actually worked slower).

Is there any library written in c that provides the same functionality? or does anyone have a good idea on how to improve performance (besides creating a new process for every PDF file I want to split)

Thanks in advance.

Following actions. Links to a couple of command line solutions, which can sometimes be faster than pyPDF:

I changed the pyPDF PdfWriter class to track how much time was spent on the _sweepIndirectReferences () method. If it is too long (now I use the magic value of 3 seconds), then I return to using ghostscript, making it call from python.

Thanks for all your answers. (the xpdf reference link is the one that made me look for a different approach)

+5
source share
4 answers

mbtPdfAsm - PDF.

Xpdf , GPL ++. .

+3

python? pure-Perl CAM:: PDF PDF-. , .

+2

pdfLaTex PDF .

. TeX , python script, LaTex .

+1

Psyco pyPdf?

+1
source

All Articles