Indexing pdf with page numbers using Solr

I index PDF files using Solr using ExtractingRequestHandler. I would like to display the page number along with the images in the document, for example. "the term foowas found in bar.pdfon pages 2, 3 and 5."

Can I include page numbers in the query result as follows?

+5
source share
1 answer

This will require some development efforts, but you can achieve this by indexing each page of each document as a separate Solr document, and then use code failure to group different page images for each document.

Please note that for this you need night time, the collapse of the field is not implemented in any of the currently released versions of Solr.

Also note: field folding is implemented in Solr 3.3 . Additional updates are expected in the next big version ( Solr 4.0 )

+5
source

All Articles