Latent
Dirichlet Allocation to latent topics
assigns words occurring
in one or more texts to a specified number of
latent 'topics'. If applied to a single text,
it allocates words to topics according to the unit
of context
selected
by the user. When applied to a number of texts, the results
can be used to compare them by singular
value decomposition (SVD)
and/or correspondence
analysis of
the profiles of words allocated.
An appropriate stoplist
should be used to
disregard all words which may be regarded as irrelevant to the
task of identifying latent topics in the text corpus
nominated. It is also possible
to decide to disregard all numerals as well as words occurring only
once, which will increase the speed of convergence of the allocation
process but may risk missing significant content, according to the
nature of the text(s).

A Bayesian
estimation process applies a
generative model (Blei et al., 2003) to the text corpus, or to
the assembly of
sentences or other context units specified within a single
text, which treats these as the product of sampling from a
topic distribution followed by the sampling of a
word according to the topic-specific distribution. A maximum
of 25 topics may be allocated by this routine in
HAMLET II 3.0 for Windows.