Is there a library that can open and search through a pdf file?

Is there a library that can open and search through a pdf file? Preferably in C, python or ruby ​​...

+4
source share
3 answers

There are various libraries for extracting text from PDF files. This is a little less than a β€œsearch,” but it should be easy to do.

For Ruby, try the PDF :: Toolkit .

For Python pyPdf :

pdf = pyPdf.PdfFileReader(file(path, "rb")) content = pdf.getPage(1).extractText() 
+5
source

This gnome Ruby library has a sub-library called poppler for rendering pdf files. http://ruby-gnome2.sourceforge.jp/hiki.cgi?Ruby%2FPoppler

It can also extract pdf parts as text. It can also find rectangles in a pdf document containing the text you are looking for. These methods are in the "Page" class.

http://ruby-gnome2.sourceforge.jp/hiki.cgi?Poppler%3A%3APage

Hope this helps

+1
source

I studied using Apache PDFBox for something similar, but never used it. This is a Java library, but Java works well with other languages.

0
source

All Articles