### 9.4 Multiple Discriminant Analysis

Multiple discriminant analysis (MDA) is also termed Discriminant Factor Analysis and Canonical Discriminant Analysis. It adopts a perspective similar to Principal Components Analysis, but PCA and MDA are mathematically different in what they are maximizing. MDA maximizes the difference between values of the dependent, whereas  PCA maximizes the variance in all the variables accounted for by the factor.

Geometrically, the rows of the data matrix can be considered as points in a multidimensional space, as also the group mean vectors. Discriminating axes are determined in this space, in such a way that optimal separation of the predefined groups is attained. The first discriminant function maximizes the differences between the values of the dependent variable. The second function is orthogonal to it (uncorrelated with it) and maximizes the differences between values of the dependent variable, controlling for the first factor. And so on. Though mathematically different, each discriminant function is a dimension, which differentiates a case into categories of the dependent variable based on its values on the independent variables. The first function will be the most powerful differentiating dimension, but later functions may also represent additional significant dimensions of differentiation

As in the case of Principal Components Analysis, mathematically the problem is eigenreduction of a real, symmetric matrix. The eigenvalues represent the discriminating power of the associated eigenvectors. The g groups lie in a space of at most g –1 dimensions. This will be the number of discriminant axes or factors that can be obtained in a common practical situation, when n > m > g (where n is the number of rows, and m the number of columns of the input data matrix). There is one eigenvalue for each discriminant function. The ratio of the eigenvalues indicates the relative discriminating power of the discriminant functions. For example, if the ratio of two eigenvalues is 1.6, then the first discriminant function explains 60% more between-group variance in the dependent categories than does the second discriminant function.

The relative percentage of a discriminant function equals a function's eigenvalue divided by the sum of all eigenvalues of all discriminant functions in the model. Thus it is the percent of discriminating power for the model associated with a given discriminant function. Relative % is used to decide how many functions are important. Usually, the first two or three eigenvalues are important.

The procedure for discrimination of three or more groups uses not only the total covariance matrix, but also the between groups covariance matrix. The criterion for selecting the next variable is the trace of a product of these two matrices (generalization of Mahalanobis distance for two groups). After selecting the new variable to be entered, discriminant factor analysis is performed and Discran provides the overall discriminant power and the discriminant power of the first three factors. Cases are classified according to their distances from the centroids of the groups. In each step, the program calculates and prints the classification table and the percentage of correctly classified cases for both the basic and test samples.

### Classification table for basic sample

The distance of a case x from the centers of the group g in the step q is defined as the linear function

where Tq, is the total covariance matrix (calculated for the cases from all groups) for the variables included in step q, with the elements

A case is assigned to the group for which has the smallest value (the smallest distance).

The classification table and the percentage of cases correctly classified are derived in the same way as for discrimination between two groups.

### Stepwise Discriminant Analysis

The procedure for discrimination of three or more groups uses the total covariance matrix as well as the between group covariance matrix. The criterion for variable selection is the trace of the product of these two covariance matrices (generalization of Mahalanobis' distance for two groups). After selecting the new variable to be entered, discriminant factor analysis is performed and the program provides the overall discriminant power and the discriminant power of the first three factors. Cases are classified according to their distances from the centres of Groups.

At each step, the program calculates and prints the classification table and the percentages of cases correctly classified.

### Criterion for selecting the next variable

The variable selected in the step q is the one which maximizes the value of the trace of the matrix , where Tq is the total covariance matrix used in step q and Bq is the matrix of covariances between groups, with the elements

The following part of analysis is performed in one of the three following circumstances:

• when the step precedes a decrease of the percentage of correctly classified cases,
• when the percentage of correctly classified cases is equal to 100,
• when the step is the last one.

#### Allocation and distances of cases in the basic sample

The distances from each group are calculated using the variables retained in the step. The assignment of cases to the groups is done according to the above criterion.

#### Discriminant factor analysis

The matrix is analyzed. The first two eigenvectors corresponding to the two highest eigenvalues of this matrix are the two discriminant factorial axes. The discriminant power of the factors is measured by the corresponding eigenvalues. Since the program provides the discriminant power for only the first three factors, the sum of eigenvalues allows the estimation of the level of remaining eigenvalues, i.e. those which are not printed.

#### Values of discriminant factors for all cases and group means

For a case, the value of discriminant factor is calculated as the scalar product of the cases vector containing variables retained in the step by the eigenvector corresponding to the factor. Note that these values are not printed, but they are used in a graphical representation of cases in the space of the first two factors.

For a group mean, the value of discriminant factor is calculated in the same way replacing the case vector by the group mean vector.

#### Allocation and distances of cases in the test sample

The distances from each group are calculated in the same way, and assignment of cases to the groups is done following the same rules as for the basic sample.

#### Allocation and distances of cases in the anonymous sample

The distances from each group are calculated in the same way and assignment of cases to the groups is done following the same rules as for the basic sample.