This is the homepage of HAMLET II 3.0 - computer assisted text analysis


"Words, words, words." Hamlet (II,ii,194)

The main idea of HAMLET II 3.0(c) is to search text files for words or categories in a given vocabulary list, and to count their joint frequencies within any specified context unit, within sentences, or as collocations within a given span of words.

This procedure is applicable when there are good grounds for searching for inter-connections between a number of key words. The latest version now includes a procedure to assist in identifiying these in relation to potential latent topics according to the generative model provided by Latent Dirichlet Allocation.

The benefit of measuring empirical properties of texts is nicely combined with HAMLET's features of graphical visualization. Qualitative and quantitative analysis are integral parts of HAMLET's design. Unlike much other text analysis software, HAMLET II 3.0 provides maximum transparency of the processes involved in a single, user-friendly, interface, leaving the user in complete control.

Individual word frequencies (fi) , joint frequencies (fij) for pairs of words (i,j), both expressed in terms of the chosen unit of context, and the corresponding standardised joint frequencies sij = (fij) / (fi + fj - fij) are organised in a similarities matrix, which can be submitted to a combination of cluster analysis and multi-dimensional scaling to discover significant word-associations.

In addition to the above (Jaccard) coefficient, it is possible to apply Sokal's 'matching coefficient', which takes account also of joint non-occurrences, and the measure of association strength of van Eck and Waltman(2009), otherwise known as the proximity or probabilistic affinity index. Word co-occurrences within specified context units can also be submitted to correspondence analysis, providing further information about usage within a text.

It then becomes possible to compare the results of applying multi-dimensional scaling to matrices of joint frequencies of equivalent vocabulary lists derived from a number of texts, using Procrustean Individual Differences Scaling (PINDIS), or to apply Individual Differences Scaling (INDSCAL) to the matrices themselves. Forrest Young's SUBJSTAT procedure transforming the resulting non-Euclidean 'subject spaces' into arc-distances permits more rigorous analysis of their results. Alternatively, the profiles of occurrences of items of a given search list in a number of different texts can be compared directly by singular value decomposition or correspondence analysis.

Further procedures help to determine the broad characteristics of word usage in a text:

HAMLET II(c) offers a unique Vocabulary Editor to speed up the development of vocabulary lists for use in researching co-occurrences and a new fast track procedure to apply them efficiently in comparing large numbers of texts in one operation, with PINDIS, INDSCAL, SVD or Correspondence Analysis.

The unique graphics of HAMLET II(c) summarise the results of each of these analyses, for inclusion in other documents and reports. Numerical results can be saved, if necessary, in CSV format for further statistical analysis in STATA, Microsoft Excel or R.

HAMLET II 3.0 for Windows(c) is suitable for use with Microsoft Windows XP, Vista, Windows 7, 8, 8.1 & 10. Full documentation is available here .

For running HAMLET II 3.0 for Windows using WINE on free Debian GNU/Linux consult our documentation about Hamlet II on Debian GNU/LINUX!  

Download HAMLETDownload HAMLET II 3.0 Download documentation HAMLET II 3.0 tutorial (HTML)

Originators and sole distributors:
Alan Brier Associate Member, ESRC-National Centre for Research Methods, Southampton, UK
Bruno Hopp GESIS - Leibniz-Institut für Sozialwissenschaften, Cologne, Germany

Please address all enquiries and report any problems to Valid HTML 4.01 Transitional