8.1.1 Goodness of fit

The difference between a particular distance and its corresponding pseudo-distance (d ij-dij) serves as an index of how badly the distance between i and j in the solution configuration departs from the value required to preserve an ordinal relation with the data. If there is no inversion, then the difference will be zero. Alternatively, the difference can be viewed as the residual from monotone regression i.e. an index of the difference between the solution distance and (an ordinal re-scaling of) the data.

The overall measure of how the distances in the configuration ordinally fit the data is called Stress – a measure of the badness of fit.

Raw Stress =

dij = distance between variables i and j in the configuration

= those values which minimize the stress, subject to the constraint the dij’s have the same rank order as the input data

However, raw stress is a very unsatisfactory measure of the fit. Configurations, which are identical in all but size, will have different values of stress. By normalizing the raw stress, it is possible to compare configurations by making stress independent of the size or scale of the configuration and norming its value between 0 (perfect fit) and 1 (worst possible fit).

The two most commonly used normalizing factors are:

Sum of squared distances:

Sum of squared differences between

the distances and their average

Thus we have two measures of normalized stress:

Stress SQDIST 

Stress SQDEV

where = mean of all dij’s

Evaluation and Interpretation

After the minimum stress solutions have been obtained for a range of dimensionalities, a selection of the solution dimensionality is made. A common procedure for deciding the dimensionality of the solution is the Scree test, which involves the following steps:

We can, in principle, use the "elbow" in the scree plot as a guide to the dimensionality of the solution.. In practice, however, such elbows are rarely obvious, and other theoretical criteria are necessary to determine dimensionality.

Another criterion for deciding how many dimensions to interpret is the clarity of the final configuration. Sometimes, the resultant dimensions can be easily interpreted. At other times, the points in the plot form a sort of random cloud, and there is no easy way to interpret the dimensions. In such a situation, try to include more or fewer dimensions and examine the resultant final configurations. Often, more interpretable solutions emerge. However, if the data points in the plot do not follow any pattern, and if the stress plot does not show any clear elbow, then the data are most likely a random noise, and there is no structure in the data.

The coordinates issued by Mdscal can be plotted as two-dimensional plots, using the module Config.

Interpretation of MDS configurations

From a mathematical standpoint, non-zero stress values occur for only one reason: insufficient dimensionality. That is, for any given data set, it may be impossible to perfectly represent the input data in two or other small number of dimensions. On the other hand, any data set can be perfectly represented using n-1 dimensions, where n is the number of items scaled. As the number of dimensions used increases, the stress either decreases or remains the same. It can never increase.

Of course, it is not necessary that an MDS map should have zero stress in order to be useful. A certain amount of distortion is tolerable. The following tolerance levels of stress have been suggested in the literature.

Quality of configuration 

Stress SQDIST

Stress SQDEV

Poor

20.0%

40.0%

Fair

10.0%

20.0%

Good

5.0%

10.0%

Excellent

2.5%

5.0%

Perfect

0.0%

0.0%

Care must be exercised in interpreting any map that has non-zero stress since, by definition, non-zero stress means that some or all of the distances in the map are, to some degree, distortions of the input data. The distortions may be spread out over all pair wise relationships, or concentrated in just a few egregious pairs. In general, longer distances tend to be more accurate than shorter distances. This is because the goodness of fit measures tend to place greater emphasis on deviations corresponding to the larger dissimilarities. For this reason, the relative position of objects that are close together should not be used to derive conclusions. They should be simply viewed as a cluster of similar objects.

The orientation of the reference axes is arbitrary, and may not be easily interpretable. Therefore, one may look for orthogonal or even oblique rotation, so that the axes are interpretable. IDAMS module Config can be used for this purpose.

Interpretation of an MDS plot is a two- stage process.

In doing so, it is important that the patterns make use of the features of the configuration that are not simply the arbitrary artifacts of the scaling process. Which information is significant and which aspects are stable in a configuration obtained from the MDS model? Since our interest is in the inter-point distances, it should be noted that the configuration is invariant to rotation, reflection of axes and uniform scaling. These procedures, which can be performed using the IDAMS module CONFIG, can help in the interpretation of MDS results.

Arrangement of points in a space does not normally exhibit any self-evident structure. Nor there is any procedure that will automatically detect structure in a configuration. We have to bring additional information to the task of interpretation. Two aspects are particularly important – pattern and meaning.

Patterning is easy – the points may lie along a straight line, or a circle, or a parabola or even a set of discrete clusters, but it is more difficult to pick out general directions or overlapping clumps or even a 2-dimensional plane in a 3-dimensional space. The meaning of a configuration is a more complex matter.

Aspects of the original data may be represented in a graph theoretic way as line segments within the configuration. Alternatively we may submit the same data to a clustering algorithm and use it to interpret the scaling solution.