Right now, I'm using the LDA theme modeling tool from the MALLET package to do some topic detection in my docs. At first everything is fine, I have 20 topics. However, when I try to conclude about a new document using the model, the result will be somewhat puzzled.
For example, I intentionally run my model on a document that I created manually that contains nothing but keywords from one of the "FLU" themes, but the distributed topics I received were <0.1 for each topic. Then I try to do the same on one of the already selected documents, which has a high score of 0.7 for one of the topics. Again the same thing happened.
Can someone tell me the reason?
I tried asking for the MALLET mailing list, but apparently no one answered.
source share