Discriminant analysis finds a set of linear combinations of the variables, whose values are as close as possible within groups and as far apart as possible between groups. The linear combinations are called discriminant functions. Thus, a discriminant function is a linear combination of the discriminating variables. It has the following mathematical form:
fkm = the value (score) on the canonical discriminant function for case m in the group k.
Xikm = the value on discriminant variable Xi for case m in group k; and
ui = coefficients which produce the desired characteristics in the function.
The coefficients for the first discriminant function are derived so as to maximize the differences between the group means. The coefficients for the second discriminant function are derived to maximize the difference between the group means, subject to the constraint that the values on the second discriminant function are not correlated with the values on the first discriminant function, and so on. In other words, the second discriminant function is orthogonal to the first, and the third discriminant function is orthogonal to the second, and so on. The maximum number of unique functions that can be derived is equal to the number of groups minus one or equal to the number of discriminating variables, whichever is less.
The discriminant functions are generated from a sample of individuals (or cases), for which group membership is known. The functions can then be applied to new cases with measurements on the same set of variables, but unknown group membership.
Computationally, discriminant function analysis is similar to the analysis of variance (ANOVA)
Let X = [xij] be the data matrix with n rows (individuals or observations) indexed by i and p columns (variables) indexed by j.
The overall mean of variable j is written as:
The n rows of X are partitioned a priori into q groups. Group k is characterized by a set Ik of nk values of index i, with
Let be the mean of variable j in group k:
For every variable j we have the equation
The total covariance between two variables j and j´ can be written as:
covariance (j, j´) =
covariance ………… ..(2)
As in the analysis of variance, covariance (j , j´) can be partitioned into the sum of the within-group covariances and between-group covariances.
It can be easily seen that Equation 2 can be reduced to
This means that the total covariance (T) equals the sum of the covariance within the groups (W) and the covariance between the groups (B)
The matrices W and B contain all the information about the relationships within the groups and between the groups. The size of the elements of B relative to those in W provides a measure of how distinct the groups are.
The discriminant function with the desired properties can be derived, for which it would be necessary to solve the following simultaneous equations:
l is a constant, called the eigenvalue
v’s are a set of p coefficients
b’s are between sum of squares and cross products
w’s are within sum of squares and cross products
The values of b’s and w’s are known, since these can be easily calculated from the sample.
To obtain unique solutions of the above equations, it would be necessary to impose the following restrictions:
Σvi 2 = 1.
The maximum number of unique solutions is equal to the number of groups minus one or the number of discriminating variables, whichever is less.
Each solution, which yields its own l and the set of v’s, corresponds to one discriminant function. However, v’s cannot be interpreted as coefficients, since the solution does not have a logical constraint on the origin or the metric units used for the discriminant space. The coefficients v’s can be transformed into u’s of the dicriminant function (Equation 1) as follows:
The discriminant function with the largest l value is the most powerful discriminator, while the function with the smallest l value is the weakest. l = 0 implies no difference between the groups.
The most common test for the statistical significance of the discriminant function is based on the residual discrimination in the system prior to deriving that function. If the residual discrimination is too small, then it is meaningless to derive any more functions. The most appropriate formula in this context is Wilk’s lambda L which is defined as follows:
where k denotes the number of functions already derived.
L is an ‘inverse’ measure.
l Þ 1.0 No discrimination i.e., the group centroids are identical.
l Þ 0 Maximum discrimination i.e., the group centroids are for apart.
The statistical significance of L can be tested by first converting it into an approximation of the Chi-square distribution:
with (p-k)´ (g-k-1) degrees of freedom