figshare
Browse

Number of topics.pdf

Download (81.06 kB)
figure
posted on 2023-06-03, 08:08 authored by Giuseppe VitoGiuseppe Vito

Existing algorithms for determining the optimal number of topics [Griffiths and Steyvers (2004), J. Cao, T. Xia, J. T. Li, Y. D. Zhang, and S. Tang (2009), Deveaud, SanJuan, and Bellot (2014), Arun, Suresh, Veni Madhavan, and Narasimha Murthy (2010)], subsequently consolidated in the Idatuning package (Nikita & Chaney, 2020)] and that of perplexity by (Blei et al., 2003) provide extremely dissimilar and frequently insignificant results. In the case of  the analyzed corpus, we obtained the following results: Cao: 40; Griffiths: 90; Arun: not significant; Deveaud: not significant; Blei: 110. 

   Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National academy of Sciences, 101(suppl_1), 5228-5235. 

   Cao, J., Xia, T., Li, J. T., Zhang, Y. D., & Tang, S. (2009). A density-based method for adaptive LDA model selection. Neurocomputing, 72(7-9), 1775-1781. doi:https://sci-hub.ru/10.1016/j.neucom.2008.06.011

   Arun, R., Suresh, V., Veni Madhavan, C., & Narasimha Murthy, M. (2010). On finding the natural number of topics with latent dirichlet allocation: Some observations. Paper presented at the Advances in Knowledge Discovery and Data Mining: 14th Pacific-Asia Conference, PAKDD 2010, Hyderabad, India, June 21-24, 2010. Proceedings. Part I 14.

   Deveaud, R., SanJuan, E., & Bellot, P. (2014). Accurate and effective latent concept modeling for ad hoc information retrieval. Document numérique, 17(1), 61-84. 

   Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993-1022. 


Funding

This work has not benefited from any funding

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC