SUBJSTAT  -  statistics of vector spaces

An approach due to Forrest Young (Schiffman et al. (1981)) is used to assist in interpreting the "subject spaces" produced by PINDIS and INDSCAL.  It is required because these "spaces" strictly consist of vectors, each repesenting an item in the subject space, which need to be compared in terms of their angular separation. It is a common a mistake to regard these spaces as Euclidean. SUBJSTAT  offers instead an appropriate arc-distance measure for the analysis of distances between items, using the subject space coordinates saved after applying PINDIS or INDSCAL . The vector coordinates created by Singular Value Decomposition, which are already expressed as unit length projections in the relevant number of dimensions, can be treated in a similar way.  

Subject space groups

If the items in the subject space are assigned to a priori groups, SUBJSTAT provides a means of examining the differences between these groups in their weighting of the dimensions of the space. The mean direction of a set of vectors is the mean of the normalized subject weights which it contains, and is used to calculate the mean resultant for the set of vectors. Subtracting this measure from 1.0 yields an index of angular variation of a set of directed vectors, with the important property that the total angular variation can be separated into within-group and between-group components, analogous to the property of normal variance applied in analysis of variance. If the weights are all positive, however, the maximum value is dependent upon the dimensionality of the subject space. Mardia (1972) proposed a transformation of the index of angular variation to yield values in the range 0 to infinity, which is called the circular standard deviation index.

Where items are assigned to groups, SUBJSTAT produces descriptive statistics: the resultants, standard deviations and mean directions (in dimension coordinates), followed by an analysis of angular variation (ANAVA).

Unfortunately there is no ANAVA analogue of two-way or multiple-way analysis of variance. It is necessary to perform several separate one-way ANAVAs, assuming that there is no interaction between them, but there is no way to check the validity of this assumption. An F-test for ANAVA also assumes that angular variation within groups is homogeneous, and the sigificance level is accurate only when the observations (i.e. the subject weights) are independently distributed and in the range -1.0 to +1.0, which is not the case with these results. However, the weights entered can be considered conditionally independent, at least when the stimulus configuration is obtained from a hypothesized stimulus space which has been used to compute the weights.  

SUBJSTAT then calculates the arc-distances between the end points of the normalized subject vectors, which offers an appropriate measure for the analysis of the distances between subjects in the subject space. It also reports the arc distances between individual subjects and the means for the groups to which they may have been assigned. These can then be submitted to multidimensional scaling by MINISSA to obtain a graphical representation of the subjects in a Euclidean space, if desired.

Finally, SUBJSTAT  offers to apply Mike Brusco's non-hierarchical clustering  procedure to partition the subjects according to the similarities in their arc-differences. This may support, or suggest  possible changes to, a priori allocations to groups which may have been made made on inputting the data.