Number of topics.pdf
Existing algorithms for determining the optimal number of topics [Griffiths and Steyvers (2004), J. Cao, T. Xia, J. T. Li, Y. D. Zhang, and S. Tang (2009), Deveaud, SanJuan, and Bellot (2014), Arun, Suresh, Veni Madhavan, and Narasimha Murthy (2010)], subsequently consolidated in the Idatuning package (Nikita & Chaney, 2020)] and that of perplexity by (Blei et al., 2003) provide extremely dissimilar and frequently insignificant results. In the case of the analyzed corpus, we obtained the following results: Cao: 40; Griffiths: 90; Arun: not significant; Deveaud: not significant; Blei: 110.
Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National academy of Sciences, 101(suppl_1), 5228-5235.
Cao, J., Xia, T., Li, J. T., Zhang, Y. D., & Tang, S. (2009). A density-based method for adaptive LDA model selection. Neurocomputing, 72(7-9), 1775-1781. doi:https://sci-hub.ru/10.1016/j.neucom.2008.06.011
Arun, R., Suresh, V., Veni Madhavan, C., & Narasimha Murthy, M. (2010). On finding the natural number of topics with latent dirichlet allocation: Some observations. Paper presented at the Advances in Knowledge Discovery and Data Mining: 14th Pacific-Asia Conference, PAKDD 2010, Hyderabad, India, June 21-24, 2010. Proceedings. Part I 14.
Deveaud, R., SanJuan, E., & Bellot, P. (2014). Accurate and effective latent concept modeling for ad hoc information retrieval. Document numérique, 17(1), 61-84.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993-1022.