The vocabulary list

The vocabulary list to be used by HAMLET II - Joint Frequencies can be entered directly in the first instance, using the vocabulary editor and may then be saved in a named file for subsequent editing and re-use. This process also can be carried out in conjunction with  Latent Dirichlet Allocation of individual words in one or more texts to an arbitrary number of  latent topics.

Setting optional maximum sizes for words and the number of words in the search list for a  file helps to reduce unnecessary searching.

Words supplied to the vocabulary list can contain the 'wild card' characters '@' and '*', with the following effect: when comparing words in the text with the vocabulary list, individual letters corresponding to the position of the character '@', and all characters after, and including, the position of the character '*', will be ignored. This, for example, provides a way of treating as equivalent words which differ only in their suffixes. Care is needed that use of these characters does not create logical equivalences between two or more entries in the vocabulary list, as this may confuse the searching process.

'Words' may also consist of significant combinations of words, such as 'United Nations', 'Prime Minister' or 'National Health Service', although care is required to avoid also specifying the same words individually in the search list, which can lead to confusion. Upper- and lower-case letters may be regarded separately, or treated as equivalent. The latter option, of course, will normally regard words beginning sentences as different from the same words occurring later. Such words will have to be explicitly and separately specified in the vocabulary list if they are not to be missed when searching. Hence the importance of knowing the basic vocabulary of the text before considering the use of HAMLET II.

To simplify searching for groups of words of equivalent meaning, each main entry to a vocabulary list may have an accompanying list of words which will be counted as if they were its 'synonyms'. These need, of course, not be literal synonyms, but can be any word strings found in the text which can conveniently be grouped together when calculating joint frequencies. Main entries and their associated synonyms can be saved for re-use, and edited as necessary to modify, delete or add new main entries or associated synonyms.

The vocabulary list editor is used to create and edit these lists.

Vocabulary editor

The vocabulary list editor offers two ways to speed up the process of developing a vocabulary list. 

1) Clicking on Open a text file ... from the Edit menu allows items to be dragged and dropped directly from a text file into the lists being edited. 

Clicking on List words in text ... will display the words in any selected text file which are not present in the current list or have not already been considered for inclusion in it. These appear in a window to the right of the editor. With the left mouse button depressed, you can drag and drop words from this window into the list displayed, as new main entries or related items. With the  left mouse button:clicking on 'Words >' reverses the order of the items listed, and clicking on 'Freq. ' causes the columns to swap positions. Clicking on the column headers using the right mouse button switches the listing between alphabetical and frequency order. 


2) If List words in text is applied to a sequence of  texts, with the intention of developing a vocabulary list for the purpose of comparing them, words disregarded as having already been considered for inclusion may be viewed and edited by clicking on the button marked Stoplist. On closing the stoplist window, or finishing work for the time being on a particular search list, you will be prompted to save the current stoplist for later reference.