How to evaluate the best K for LDA with Mallet?

Question

How to evaluate the best K for LDA with Mallet?

I use the Mallet api to extract themes from twitter data, and I already have extracted themes that seem like a good theme. But I ran into a problem in evaluating K.

For example, I fixed the value of K from 10 to 100. Thus, I took different data from the data. But now I would like to evaluate which K is better. There is some algorithm that I know how

Perplexity
Empirical probability
Marginal Credibility (Harmonic Means Method)
Silhouette

I found a model.estimate () method that can be used to evaluate with a different K value. But I don’t think to show that the K value is best suited for the model. Does anyone give some idea on this with some sample code? Thanks.

+2

cluster-analysis topic-modeling mallet lda

Khaled Jul 30 '15 at 16:26

source share

1 answer

jknappen · Answer 1 · 2015-08-03T12:10:03+0000

I believe that the best algorithm is human judgment. Create thematic models with a different number of themes, look at them and take what you like. Sometimes you want to fine-tune the number of topics (say, you don’t want a particular topic to be divided into two parts, or you want a specific topic to be there and not be combined with another).

How to evaluate the best K for LDA with Mallet?

More articles: