MDSCAL is a non-metric
multidimensional scaling program for the analysis of similarities.
The program, which operates on a matrix of similarity or dissimilarity
measures, is designed to find, for each dimensionality specified,
the best geometric representation of the data in the space. The
uses of non-metric multidimensional scaling are similar to those of
factor analysis, e.g. clusters of variables can be spotted, the dimensionality
of the data can be discovered, and dimensions can sometimes be interpreted.
The CONFIG program can be used to perform analysis on an MDSCAL output
configuration. Input configuration. Normally an internally
created arbitrary starting configuration is used to begin the computation.
The user may, however, supply an initial configuration. There are
several possible reasons for providing a starting configuration. The
user may have theoretical reasons for beginning with a certain configuration;
one may wish to perform further iteration on a configuration which
is not yet close enough to the best configuration; or, to save computing
time, one may wish to provide a higher dimensional configuration as
a starting point for a lower dimensional configuration. Scaling
algorithm. The program starts with an initial configuration, either
generated arbitrarily or supplied by the user, and iterates (using
a procedure of the "steepest descent" type) over successive trial
configurations, each time comparing the rank order of inter-point
differences in the trial configuration with the rank order of the
corresponding measure in the data. A "badness of fit" measure (stress
coefficient) is computed after each iteration and the configuration
is rearranged accordingly to improve the fit to the data, until, ideally,
the rank order of distances in the configuration is perfectly monotonic
with the rank order of dissimilarities given by the data; in that
case, the "stress" will be zero. In practice, the scaling computation
stops, in any given number of dimensions, because the stress reaches
a sufficiently small value (STRMIN), the scale factor (magnitude)
of the gradient reaches a sufficiently small value (SRGFMN), the stress
has been improving too slowly (SRATIO), or the preset maximum number
of iterations is reached (ITERATIONS). The program stops on whichever
condition comes first. The same procedure is repeated for the next
lower dimensionality using the previous results as the initial configuration,
until a specified minimum number of dimensions is reached. During
computation, the cosine of the angle between successive gradients
plays an important role in several ways; optionally, two internal
weighting parameters may be specified (see parameters COSAVW and ACSAVW).
Dimensionality and metric. Solutions may be obtained in
2 to 10 dimensions. The user controls the dimensionality of the configurations
obtained by specifying the maximum and minimum number of dimensions
desired, and the difference between the dimensionality of the successive
solutions produced (see parameters DMAX, DMIN, and DDIF). The user
also specifies, using parameter R, whether the distance metric should
be Euclidean (R=2), the usual case, or some other Minkowski r-metric.
Stress. Stress is a measure of how well the configuration
matches the data. The user may choose between two alternate formulas
for computing the stress coefficient: either the stress is standardized
by the sum of the squared distances from the mean (SQDIST) or the
stress is standardized by the sum of the squared deviations from the
mean (SQDEV). In many situations, the configurations reached by the
two formulas will not be substantially different. Larger values of
stress result from formula 2 for the same degree of fit. Ties
in input coefficients. There are two alternative methods for handling
ties among the input data values; the corresponding distances can
be required to be equal (TIES=EQUAL) or they can be allowed to differ
(TIES=DIFFER). When there are few ties, it makes little difference
which approach is used. When there are a great many ties it does make
a difference, and the context must be considered in making the choice.
28.1  General Description
| Iteration | the iteration number |
| Stress | the current value of the stress |
| SRAT | the current value of the stress ratio |
| SRATAV | the current stress ratio average (it is an exponentially weighted average) |
| CAGRGL | the cosine of the angle between the current gradient and the previous gradient |
| COSAV | the current value of the average cosine of the angle between successive gradients |
| (a weighted average) | |
| ACSAV | the current value of the average absolute value of the cosine of the angle |
| between successive gradients (a weighted average) | |
| SFGR | the length (more properly, the scale factor) of the gradient |
| STEP | the step size. |
Reason for termination. When computation is terminated, the reason is indicated by one of the remarks: "Minimum was achieved", "Maximum number of iterations were used", "Satisfactory stress was reached", or "Zero stress was reached".
Final configuration. For each solution, the Cartesian coordinates of the final configuration are printed.
Sorted configuration. (Optional: see the parameter PRINT). For each solution, the projections of points of the final configuration are sorted separately on each dimension into ascending order and printed.
Summary. For each solution, the original data values are sorted and printed together with their corresponding final distances (DIST) and the hypothetical distances required for a perfect monotonic fit (DHAT).
| STAN | upper-right triangle, no diagonal |
| STAN, DIAG | upper-right triangle, with diagonal |
| LOWER, DIAG | lower-left triangle, with diagonal |
| LOWER | lower-left triangle, no diagonal |
| SQUARE | full square matrix with diagonal. |
The measures contained in the data matrix may either be measures of similarity (such as correlations) or dissimilarities. Although the input to MDSCAL is usually a matrix of correlation coefficients (e.g. a matrix of gammas or a matrix of Pearson r's), the input matrix may contain any measure that makes sense as a measure of proximity. Because non-metric scaling uses only ordinal properties of the data, nothing need be assumed about the quantitative or numerical properties of the data. There should be, at the very least, twice as many variables as dimensions.
If a weight matrix
is supplied, it must be in exactly the same format as the input data
matrix. The parameter INPUT=(STAN/LOWE/SQUA, DIAG) applies to the
weight matrix as well as to the data matrix. The dictionary for the
weight matrix should be the same as for the input data matrix. Means
and standard deviations are not used, but corresponding "dummy" lines
should be supplied. This matrix contains values, in one-to-one
correspondence with elements of the data matrix, which are to be used
as weights for the data. These values are used in conjunction with
the value for the parameter CUTOFF when applied to the data. If a
data value is greater than the cutoff value, but the corresponding
weight value is less than or equal to zero, an error condition is
signaled. Likewise, if the data value is less than or equal to the
cutoff value, and the corresponding weight value is greater than zero,
an error condition is set. If either of these inconsistencies occurs,
the execution terminates. 28.6  Input Weight Matrix
$RUN MDSCAL
$FILES
File specifications
$SETUP
1. Label
2. Parameters
$MATRIX (conditional)
Data matrix
Weight matrix
Starting configuration matrix
(Note: Not all of the matrices need be included here; however, if
more than one matrix is included, they must be in the above order).
Files:
FT02 output configuration matrix
FT03 input weight matrix if INPUT=WEIGHTS specified (omit if $MATRIX used)
FT05 input starting configuration if INPUT=CONFIG specified
(omit if $MATRIX used)
FT08 input data matrix (omit if $MATRIX used)
PRINT results (default IDAMS.LST)
|
Refer to "The
IDAMS Setup File" chapter for further descriptions of the program
control statements, items 1-2 below.
INPUT=(STANDARD /LOWER/SQUARE, DIAGONAL, WEIGHTS,
CONFIG)
VARS=(variable list)
FILE=(DATA, WEIGHTS, CONFIG)
COEFF=SIMILARITIES /DISSIMILARITIES
DMAX=2 /n
DMIN=2 /n
DDIF=1 /n
R=2.0 /n
CUTOFF=0.0 /n
TIES=DIFFER /EQUAL
ITERATIONS=50 /n
STRMIN=.01 /n
SFGRMN=0.0 /n
SRATIO=.999 /n
ACSAVW=.66 /n
COSAVW=.66 /n
STRESS=SQDIST /SQDEV
WRITE=CONFIG
PRINT=(MATRIX, SORTCONF, LONG /SHORT)
28.9  Program Control Statements
Example: MDSCAL EXECUTION ON DATASET X4952
Example: DMAX=5 ITER=75 WRITE=CONFIG