
The main idea of HAMLET II 3.0(c) is to search text files for words or categories in a given vocabulary list, and to count their joint frequencies within any specified context unit, within sentences, or as collocations within a given span of words.
This procedure is applicable when there are good grounds for searching for inter-connections between a number of key words. The latest version now includes a procedure to assist in identifiying these in relation to potential latent topics according to the generative model provided by Latent Dirichlet Allocation.
The benefit of measuring empirical properties of texts is nicely combined with HAMLET's features of graphical visualization. Qualitative and quantitative analysis are integral parts of HAMLET's design. Unlike much other text analysis software, HAMLET II 3.0 provides maximum transparency of the processes involved in a single, user-friendly, interface, leaving the user in complete control.
Individual word frequencies (fi) , joint
frequencies (fij) for pairs of words
(i,j), both expressed in terms of the
chosen unit of context, and the corresponding standardised joint
frequencies sij = (fij) / (fi +
fj - fij) are organised in a similarities matrix, which can be submitted to a
combination of cluster analysis and
multi-dimensional scaling to discover
significant word-associations.
In addition to the above (Jaccard) coefficient, it is possible to apply Sokal's 'matching coefficient', which takes account also of joint non-occurrences, and the measure of association strength of van Eck and Waltman(2009), otherwise known as the proximity or probabilistic affinity index. Word co-occurrences within specified context units can also be submitted to corrrespondence analysis, providing further information about usage within a text.
It then becomes possible to compare the results of applying multi-dimensional scaling to matrices of joint frequencies of equivalent vocabulary lists derived from a number of texts, using Procrustean Individual Differences Scaling (PINDIS), or to apply Individual Differences Scaling (INDSCAL) to the matrices themselves. Forrest Young's SUBJSTAT procedure transforming the resulting non-Euclidean 'subject spaces' into arc-distances permits more rigorous analysis of their results. Alternatively, the profiles of occurrences of items of a given search list in a number of different texts can be compared directly by singular value decomposition or correspondence analysis.
Further procedures help to determine the broad characteristics
of word usage in a text:
The unique graphics of HAMLET II(c) summarise the
results of each of these analyses, for inclusion in other
documents and reports. Numerical results can be saved, if
necessary, in CSV format for further statistical analysis in
STATA, Microsoft Excel or R.
HAMLET II 3.0 for Windows(c) is suitable for use with Microsoft Windows XP, Vista, Windows 7, 8, 8.1 & 10. Full documentation is available here .
For running HAMLET II 3.0 for Windows using WINE on free Debian GNU/Linux consult our documentation about Hamlet II on Debian GNU/LINUX!