How to determine the number of topics for LDA?

I am new to LDA and I want to use it in my work. However, there are some problems.

To get the best performance, I want to rate the best topic number. After reading Search for Scientific Topics, I know that I can first compute logP (w | z) and then use the average harmonic of the series P (w | z) to estimate P (w | T).

My question is: what does "series" mean?

Sorry for my English and thank you for your attention.

+7
nlp data-mining lda
source share
2 answers

Unfortunately, there is no solid science giving the right answer to your question. As far as I know, a hierarchical dirichlet process (HDP) is quite possibly the best way to achieve the optimal number of topics.

If you are looking for more in-depth analyzes, this HDP document talks about the benefits of HDP in determining the number of groups.

+6
source share

At first, some people use the harmonic mean to find the optimal number of topics, and I also tried, but the results are unsatisfactory. Since at my suggestion, if you use R, then the "ldatuning" package will be useful. It has four indicators for calculating the optimal parameters. Again, perplexity and V-likelihood-based cross-validation checking are also a very good option for better topic modeling. V-Fold cross validation is time consuming for a large dataset. You can see the "Heuristic approach for determining suitable topics in thematic modeling." Important links: https://cran.r-project.org/web/packages/ldatuning/vignettes/topics.html https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4597325/

+2
source share

All Articles