Discriminant analysis finds a set of linear combinations of the variables, whose values are as close as possible within groups and as far apart as possible between groups. The linear combinations are called discriminant functions. Thus, a discriminant function is a linear combination of the discriminating variables. It has the following mathematical form:

……… ..(1)

where

*f _{km} *= the value (score) on the
canonical discriminant function for case

*X _{ikm}* = the value on discriminant
variable

*u _{i}* = coefficients which produce
the desired characteristics in the function.

The coefficients for the first discriminant function are derived so as to maximize the differences between the group means. The coefficients for the second discriminant function are derived to maximize the difference between the group means, subject to the constraint that the values on the second discriminant function are not correlated with the values on the first discriminant function, and so on. In other words, the second discriminant function is orthogonal to the first, and the third discriminant function is orthogonal to the second, and so on. The maximum number of unique functions that can be derived is equal to the number of groups minus one or equal to the number of discriminating variables, whichever is less.

The discriminant functions are generated from a sample of individuals (or cases), for which group membership is known. The functions can then be applied to new cases with measurements on the same set of variables, but unknown group membership.

Computationally, discriminant function analysis is similar to the analysis of variance (ANOVA)

Let **X** = [*x _{ij}*] be the
data matrix with

The overall mean of variable *j* is written as:

The *n* rows of **X** are partitioned *a priori* into *q*
groups. Group *k* is characterized by a set *I _{k}* of

Let be the mean of variable *j* in group *k*:

For every variable *j* we have the equation

The total covariance between two variables *j* and *j´* can be
written as:

covariance (*j, j*´) =

or

covariance ………… ..(2)

As in the analysis of variance, covariance (*j , j´*) can be
partitioned into the sum of the within-group covariances and between-group
covariances.

It can be easily seen that Equation 2 can be reduced to

covariance (3)

This means that the total covariance (**T**) equals the sum of the
covariance within the groups (**W**) and the covariance between the groups (**B**)

**T=W+B**

where

covariance

The matrices **W** and **B** contain all the information about the
relationships within the groups and between the groups. The size of the
elements of **B** relative to those in **W** provides a measure of how
distinct the groups are.

The discriminant function with the desired properties can be derived, for which it would be necessary to solve the following simultaneous equations:

.

.

where

*l *is
a constant, called the eigenvalue

*v*’s are a set of *p* coefficients

*b*’s are between sum of squares and
cross products

*w*’s are within sum of squares and
cross products

The values of *b’*s and *w*’s are known, since these
can be easily calculated from the sample.

To obtain unique solutions of the above equations, it would be necessary to impose the following restrictions:

Σ*v _{i }^{2}* = 1

The maximum number of unique solutions is equal to the number of groups minus one or the number of discriminating variables, whichever is less.

Each solution, which yields its own l
and the set of *v*’s, corresponds to one discriminant function.
However, *v*’s cannot be interpreted as coefficients, since the
solution does not have a logical constraint on the origin or the metric units
used for the discriminant space. The coefficients *v*’s can be
transformed into *u’*s of the dicriminant function (*Equation*
1) as follows:

The discriminant function with the largest l value is the most powerful discriminator, while the function with the smallest l value is the weakest. l = 0 implies no difference between the groups.

The most common test for the statistical significance of the discriminant function is based on the residual discrimination in the system prior to deriving that function. If the residual discrimination is too small, then it is meaningless to derive any more functions. The most appropriate formula in this context is Wilk’s lambda L which is defined as follows:

L

where *k* denotes the number of functions already derived.

L is an ‘inverse’ measure.

l Þ 1.0 No
discrimination *i.e*., the group centroids are identical.

l Þ 0 Maximum
discrimination *i.e*., the group centroids are for apart.

The statistical significance of L can be tested by first converting it into an approximation of the Chi-square distribution:

χ

with (*p*-*k*)´ (*g*-*k*-1)
degrees of freedom