Agnes proceeds by a series of fusions. Initially (at step 0), all objects
are apart– each object forms a small cluster by itself. At the first
step, two closest or minimally dissimilar objects are merged to form a cluster.
Then the algorithm finds a pair of objects with minimal dissimilarity. If there
are several pairs with minimal dissimilarity, the algorithm picks a pair of
objects at random. Picking of pairs of objects, which are minimally dissimilar
is quite straightforward, but how to select pairs of clusters with minimal
dissimilarity? It is necessary to define a measure of dissimilarity between
clusters. Average pair group method (UPGMA) is used to compute the
dissimilarity between a pair of clusters. Dissimilarity between clusters *R*
and *Q *is defined as the average of all dissimilarities:

*d* (*i** ,j*)*
, *where* i* is any object of *R* and *j*
is any object of *Q*. More formally

*d* ( *R*, *Q*)
= [1/ (|*R*||*Q*|)]

where |*R*| and |*Q*| denote the number
of objects in clusters *R* and *Q* respectively.

Agnes computes a coefficient,
called *Agglomerative Coefficient, *which measures the clustering
structure of the data set. Agglomerative coefficient is defined as follows:

Let

d(i) denote the dissimilarity of objectito the first cluster it is merged with, divided by the dissimilarity of the merger in the last step of the algorithm.The agglomerative coefficient (

AC)is defined as the average of all [ 1-d(i)]

*Note* that the agglomerative coefficient (*AC*) defined above can
also be defined as the average width (or the percentage filled) of the banner
plot described below.

Because* AC* grows with the number of objects, this measure should not
be used to compare clusters of data sets of very different sizes.

*Dissimilarity Banner*: The hierarchy obtained by Agnes can be graphically displayed by
means of a banner, which shows the successive mergers from left to right A
banner consists of stars and stripes. The stars indicate linking of objects and
stripes are identifiers of these objects. A banner is always read from left to
right. Each line with stars starts at the dissimilarity between the clusters
being merged. There are fixed scales above and below the banner, ranging from
0.00 (Dissimilarity = 0) and 1.00 (Dissimilarity = highest dissimilarity
encountered). The actual highest dissimilarity corresponding to 1.00 in the banner
is shown just below the banner.

The overall width of the banner gives an idea of the amount of structure found by the algorithm. When the data possesses a clear cluster structure, the between - cluster dissimilarities would become larger than the within cluster dissimilarities. As a consequence, the black lines in the banner would become larger.

The average width of the banner is called the agglomerative coefficient (*AC*).
It describes the strength of the clustering structure that has been found.

*C*
=

where *l _{i}*
is the length of the line containing the identifier of object