What is the best way to get the optimal number of themes for the LDA model with Gensim?

Question

What is the best way to get the optimal number of themes for the LDA model with Gensim?

I am trying to get the optimal number of themes for the LDA model inside Gensim. One of the methods I found is to calculate the logarithmic likelihood for each model and compare them with each other, for example. at Input parameters for using the hidden Dirichlet distribution

Therefore, I looked at calculating the logarithmic probability of the LDA model with Gensim and came across the following entry: How do you evaluate the parameter α of the hidden dirichlet highlight model?

which basically claims that the update_alpha () method implements the method described in Huang, Jonathan. Estimation of the maximum likelihood of Dirichlet distribution parameters. However, I don't know how to get this parameter using libary without changing the code.

How to get the log likelihood from the LDA model using Gensim?

Is there a better way to get the optimal number of topics with Gensim?

+5

python topic-modeling text-mining gensim lda

Akantor Aug 31 '15 at 13:58

source share

1 answer

Sjb · Answer 1 · 2015-10-14T11:19:48+0000

Although I cannot comment on Gensim, in particular, I can weigh some general recommendations for optimizing your topics.

As you said, using logarithmic probability is one method. Another option is to save the set of documents issued from the process of creating the model, and display them on them when the model is completed, and check whether it makes sense.

A completely different method that you could try is the Dirichlet hierarchical process, this method can dynamically determine the number of topics in a corpus without specifying.

There are many articles on how to best determine the parameters and evaluate your theme model, depending on your level of experience, which may or may not be useful to you:

Rethinking LDA: Why Magic Prior , Wallach, HM, Mimno, D., and McCallum, A.

Evaluation Methods for Theme Models , Wallach HM, Murray, I., Salakhutdinov, R. and Mimno, D.

In addition, here is an article about the Dirichlet hierarchical process:

Hierarchical processes of Dirichlet , Teh, YW, Jordan, MI, Beal, MJ and Blei, DM

What is the best way to get the optimal number of themes for the LDA model with Gensim?

More articles: