An appropriate stoplist should be used to disregard all words which may be regarded as irrelevant to the task of identifying latent topics in the text corpus nominated. It is also possible to decide to disregard all numerals as well as words occurring only once, which will increase the speed of convergence of the allocation process but may risk missing significant content, according to the nature of the text(s).
A Bayesian estimation process applies a convenient generative model (Blei et al., 2003) to the text corpus, or to the assembly of sentences or other context units specified within a single text, which treats these as the product of sampling from a topic distribution followed by the sampling of a word according to the topic-specific distribution. A maximum of 25 topics may be allocated by this routine in HAMLET II 3.0 for Windows. On a normal PC, this routine can be very demanding of computing time and resources. To interrupt its operation at any time, press Escape or click on the display when the progress indicator is shown.
Click
on the text file names in
the
directory
selected to copy the names to the right-hand panel.
Click Stoplist to select and activate or to edit a currently selected stoplist of words to be disregarded in reading the selected text(s).
Click the menu item Clear selection to remove selected text file names and clear the current stoplist entries.
Word tokens appearing in the topics initially allocated which have no obvious relevance to the sense of the text in relation to the purpose of the investigation may be added to the stoplist currently in use, and the allocation process repeated. In this way, it is possible to refine the process by successively excluding superfluous tokens from consideration. It is not essential that the topics in themselves are open to interpretation in any useful way. They serve only as a means of identifying word tokens for use in comparing the texts or context units under consideration. If, however, a set of topics is identifed which can plausibly be identified in the sense of the current investigation, these may also be saved as a vocabulary list for use in other procedures in HAMLET II 3.0.