Can someone suggest me a method by which the chm file can be indexed, for example pdfbox for pdf.
If you have other document formats that you need to index, you can find a better and more general solution in Apache Tika
They recently added CHM Parser (for reference: CHM format support ), and it will be in the next version.
If you are talking about Microsoft Compiled HTML Help files , you can simply extract the text from them with JChm and then index it in the usual way.