How to evaluate the best K for LDA with Mallet?

I use the Mallet api to extract themes from twitter data, and I already have extracted themes that seem like a good theme. But I ran into a problem in evaluating K.

For example, I fixed the value of K from 10 to 100. Thus, I took different data from the data. But now I would like to evaluate which K is better. There is some algorithm that I know how

  • Perplexity
  • Empirical probability
  • Marginal Credibility (Harmonic Means Method)
  • Silhouette

I found a model.estimate () method that can be used to evaluate with a different K value. But I don’t think to show that the K value is best suited for the model. Does anyone give some idea on this with some sample code? Thanks.

+2
source share
1 answer

I believe that the best algorithm is human judgment. Create thematic models with a different number of themes, look at them and take what you like. Sometimes you want to fine-tune the number of topics (say, you don’t want a particular topic to be divided into two parts, or you want a specific topic to be there and not be combined with another).

0
source

All Articles