Latent Dirichlet Allocation to latent topics

assigns words occurring in one or more texts to a specified number of latent  'topics'.  If applied to a single text, it allocates words to topics according to the unit of context selected by the user. When applied to a number of texts, the results can be used to compare them by singular value decomposition (SVD) and/or correspondence analysis of the profiles of words allocated.  

An appropriate stoplist should be used to disregard all words which may be regarded as irrelevant to the task of identifying latent topics in the text corpus nominated.  It is also possible to decide to disregard all numerals as well as words occurring only once, which will increase the speed of convergence of the allocation process but may risk missing significant content, according to the nature of the text(s).


  • A Bayesian estimation process applies a generative model (Blei et al., 2003) to the text corpus, or to the assembly of  sentences or other context units specified within a single text, which treats these as the product of sampling from a topic distribution followed by the sampling of a word according to the topic-specific distribution. A maximum of 25 topics may be allocated by this routine in HAMLET II 3.0 for Windows.