9.3 Two-group Discriminant Analysis

Mahalanobis’ Distance

Mahalanobis distance is a measure of distance between two points in the space defined by two or more correlated variables. For example, if there are two variables that are uncorrelated, then we could plot points (cases) in a standard two-dimensional scatter plot; the Mahalanobis distances between the points would then be identical to the Euclidean distance. When the variables are correlated, then the axes in the plots would be non-orthogonal. In those cases, the simple Euclidean distance is not an appropriate measure, while the Mahalanobis distance will adequately account for the correlations. Mahalanobis distance, D2, is a generalized measure of the distance between two groups. The distance between groups 1 and 2 is defined as

)

where p is the number of variables in the model, is the mean for the ith variable in Group 1, is the mean for the ith variable in Group 2. is an element from the inverse of the within-groups covariance matrix.

When Mahalanobis’ distance is the criterion for variable selection, the Mahalanobis’ distance between all pairs of groups are calculated first. The variable that has the largest D2 for the two groups that are closest (have the smallest D2 initially) is selected for inclusion.

A test of the null hypothesis that the two sets of population means are equal can be based on Mahalanobis’ distance. The corresponding F statistic is

This F value can also be used for variable selection. At each step the variable chosen for inclusion is the one with the largest F value.

Classification of Cases

Classification is the process by which a decision is made whether a particular case belongs to a particular group.

For each group, we can determine the location of the point that represents the means for all variables in the multivariate space defined by the variables in the model. These points are called group centroids. For each case we can then compute the Mahalanobis distances (of the respective case) from each of the group centroids. We would classify the case as belonging to the group to which it is closest, that is, where the Mahalanobis distance is smallest.

A classification table can be constructed as follows:

Actual Groups

 

Predicted group membership

 

No. of Cases

Group 1

Group 2

Group 1

N1

n11

n12

Group 2

N2

n21

n22

 

Let n1 denote the number of cases that truly belong to Group 1 and n2 denote the number of cases that truly belong to Group 2.

n11 =    Number of cases that belong to Group 1 and assigned to Group 1 (i.e. correctly classified

n12 =    Number of cases that belong to Group 1 and are assigned to Group 2 (Incorrectly classified)

n21 =    Number of cases that belong to Group 2 but are assigned to Group 1 (i.e. incorrectly classified)

n22 =    Number of cases that belong to Group 2 and are assigned to Group 2 (i.e. correctly classified)

Total number of cases correctly classified

= n11+n22

Percentage of cases of correctly classified

where n is the total number of cases

Stepwise Discriminant Analysis

Discran uses stepwise procedure for inclusion of variables in the linear discriminant model. The procedure begins by selecting the individual variable that provides the greatest univariate discriminations (i.e. the largest value of acceptance criterion). After the first variable is entered, the value of the criterion is re-evaluated for all the remaining variables, and the variable with the largest acceptance criterion value is entered into the model. This procedure is repeated until the number of steps specified by researcher have been carried out.