Is there a way to index CHM files in Lucene?

Question

Can someone suggest me a method by which the chm file can be indexed, for example pdfbox for pdf.

+4

2 answers

If you are talking about Microsoft Compiled HTML Help files , you can simply extract the text from them with JChm and then index it in the usual way.

+3

ffriend Jun 10 '11 at 13:53

Cristian vat · Accepted Answer · 2011-06-10T16:06:40+0000

If you have other document formats that you need to index, you can find a better and more general solution in Apache Tika

They recently added CHM Parser (for reference: CHM format support ), and it will be in the next version.