Is there a way to index CHM files in Lucene?

Can someone suggest me a method by which the chm file can be indexed, for example pdfbox for pdf.

+4
source share
2 answers

If you have other document formats that you need to index, you can find a better and more general solution in Apache Tika

They recently added CHM Parser (for reference: CHM format support ), and it will be in the next version.

+3
source

If you are talking about Microsoft Compiled HTML Help files , you can simply extract the text from them with JChm and then index it in the usual way.

+3
source

All Articles