Lucene is built to answer the opposite question, that is, which documents contain a given term. Therefore, in order to get the number of terms for a document, you need to hack a bit.
The first method is to save the condition vector for each field, which is necessary in order to get the number of terms. The vector of terms is a list of field terms. During the search, you can get it using the getTermFreqVector IndexReader method (if they were saved during the index). When you have this, you will get the length of the vector, and you have the number of terms for this field.
Another method, if you saved the fields of your documents, is to return the text of these fields and calculate the number of terms by analyzing it (divide the text into words).
Finally, if approximation of the number of field members is enough for you, and you saved the norms during the index, it is possible to calculate the inverse function of the one used to calculate the field norms. If you look closely at the lengthNorm method of the affinity class, you will notice that it uses the number of field terms. The result of this method is stored in the index using the encodeNorm method. You can get the norms during the search time using the norms IndexReader method. With the norm in hand, it uses the inverse mathematical function of the one used in lengthNorm to return the number of members. As I said, this is only an approximation, because when the norm is saved, some accuracy is lost, and you may not get exactly the same number as what was saved.
source share