Is there a library that can open and search through a pdf file?

Question

Is there a library that can open and search through a pdf file?

Is there a library that can open and search through a pdf file? Preferably in C, python or ruby ...

+4

chutsu Nov 11 '09 at 2:09

3 answers

This gnome Ruby library has a sub-library called poppler for rendering pdf files. http://ruby-gnome2.sourceforge.jp/hiki.cgi?Ruby%2FPoppler

It can also extract pdf parts as text. It can also find rectangles in a pdf document containing the text you are looking for. These methods are in the "Page" class.

http://ruby-gnome2.sourceforge.jp/hiki.cgi?Poppler%3A%3APage

Hope this helps

+1

Chase m gray Nov 11 '09 at 5:29

source share

I studied using Apache PDFBox for something similar, but never used it. This is a Java library, but Java works well with other languages.

0

Ryan lynch Nov 11 '09 at 2:20

source share

Mark · Accepted Answer · 2009-11-11T02:24:57+0000

There are various libraries for extracting text from PDF files. This is a little less than a “search,” but it should be easy to do.

For Ruby, try the PDF :: Toolkit .

For Python pyPdf :

pdf = pyPdf.PdfFileReader(file(path, "rb")) content = pdf.getPage(1).extractText()

Is there a library that can open and search through a pdf file?

More articles: