Multiple discriminant analysis (MDA) is also termed *Discriminant Factor
Analysis* and *Canonical Discriminant Analysis*. It adopts a
perspective similar to *Principal Components Analysis*, but PCA and MDA
are mathematically different in what they are maximizing. MDA maximizes the
difference between values of the dependent, whereas PCA maximizes the variance in all the
variables accounted for by the factor.

Geometrically, the rows of the data matrix can be considered as points in a multidimensional space, as also the group mean vectors. Discriminating axes are determined in this space, in such a way that optimal separation of the predefined groups is attained. The first discriminant function maximizes the differences between the values of the dependent variable. The second function is orthogonal to it (uncorrelated with it) and maximizes the differences between values of the dependent variable, controlling for the first factor. And so on. Though mathematically different, each discriminant function is a dimension, which differentiates a case into categories of the dependent variable based on its values on the independent variables. The first function will be the most powerful differentiating dimension, but later functions may also represent additional significant dimensions of differentiation

As in the case of Principal Components Analysis, mathematically the problem
is eigenreduction of a real, symmetric matrix. The eigenvalues represent the discriminating power of the associated
eigenvectors. The *g* groups lie in a space of at
most *g* –1 dimensions. This will be the number of discriminant axes or factors that can be obtained in a
common practical situation, when *n* > *m*
> *g *(where *n* is the number of rows, and *m*
the number of columns of the input data matrix). There is one eigenvalue for each discriminant
function. The ratio of the eigenvalues indicates the
relative discriminating power of the discriminant
functions. For example, if the ratio of two eigenvalues
is 1.6, then the first discriminant function explains
60% more between-group variance in the dependent categories than does the
second discriminant function.

The relative percentage of a discriminant function equals a function's eigenvalue divided by the sum of all eigenvalues of all discriminant functions in the model. Thus it is the percent of discriminating power for the model associated with a given discriminant function. Relative % is used to decide how many functions are important. Usually, the first two or three eigenvalues are important.

The procedure for discrimination of three or more groups uses not only the total covariance matrix, but also the between groups covariance matrix. The criterion for selecting the next variable is the trace of a product of these two matrices (generalization of Mahalanobis distance for two groups). After selecting the new variable to be entered, discriminant factor analysis is performed and Discran provides the overall discriminant power and the discriminant power of the first three factors. Cases are classified according to their distances from the centroids of the groups. In each step, the program calculates and prints the classification table and the percentage of correctly classified cases for both the basic and test samples.

The distance of a case *x* from the centers of the group *g* in
the step *q* is defined as the linear function

where *T _{q}*,
is the total covariance matrix (calculated for the cases from all groups) for
the variables included in step

A case is assigned to the group for which has the smallest value (the smallest distance).

The classification table and the percentage of cases correctly classified are derived in the same way as for discrimination between two groups.

The procedure for discrimination of three or more groups uses the total covariance matrix as well as the between group covariance matrix. The criterion for variable selection is the trace of the product of these two covariance matrices (generalization of Mahalanobis' distance for two groups). After selecting the new variable to be entered, discriminant factor analysis is performed and the program provides the overall discriminant power and the discriminant power of the first three factors. Cases are classified according to their distances from the centres of Groups.

At each step, the program calculates and prints the classification table and the percentages of cases correctly classified.

The variable selected in the step *q* is the one which maximizes the
value of the trace of the matrix ,
where *T*_{q}* *is
the total covariance matrix used in step *q* and *B** _{q}*
is the matrix of covariances between groups, with the
elements

The following part of analysis is performed in one of the three following circumstances:

- when the step precedes a decrease of the percentage of correctly
classified cases,
- when the percentage of correctly classified cases is equal to
100,
- when the step is the last one.

The distances from each group are calculated using the variables retained in the step. The assignment of cases to the groups is done according to the above criterion.

The matrix is analyzed. The first two eigenvectors corresponding to the two highest eigenvalues of this matrix are the two discriminant factorial axes. The discriminant power of the factors is measured by the corresponding eigenvalues. Since the program provides the discriminant power for only the first three factors, the sum of eigenvalues allows the estimation of the level of remaining eigenvalues, i.e. those which are not printed.

For a case, the value of discriminant factor is calculated as the scalar product of the cases vector containing variables retained in the step by the eigenvector corresponding to the factor. Note that these values are not printed, but they are used in a graphical representation of cases in the space of the first two factors.

For a group mean, the value of discriminant factor is calculated in the same way replacing the case vector by the group mean vector.

The distances from each group are calculated in the same way, and assignment of cases to the groups is done following the same rules as for the basic sample.

The distances from each group are calculated in the same way and assignment of cases to the groups is done following the same rules as for the basic sample.