DIANA is a hierarchical clustering technique, but its main difference with the agglomerative method (Agnes) is that it constructs the hierarchy in the inverse order.
Initially (Step 0), there is one large cluster consisting of all n objects. At each subsequent step, the largest available cluster is split into two clusters until finally all clusters, comprise of single objects. Thus, the hierarchy is built in n-1 steps.
In the first step of an agglomerative method, all possible fusions of two objects are considered leading to combinations. In the divisive method based on the same principle, there are possibilities to split the data into two clusters. This number is considerably larger than that in the case of an agglomerative method.
To avoid considering all possibilities, the algorithm proceeds as follows.
Divisive Coefficient (DC)
For each object i, let d ( i ) denote the diameter of the last cluster to which it belongs (before being split off as a single object), divided by the diameter of the whole data set.
The divisive coefficient (DC), given by
indicates the strength of the clustering structure found by the algorithm.
The hierarchy obtained from Diana can be represented graphically by the dissimilarity banner.
Dissimilarity banner consists of lines with stars and stripes, which repeat the identifiers of the objects. The banner is read from left to right, but the fixed scales above and below the banner range from 1.00 (corresponding to the diameter of the entire data set) to 0.00 (corresponding to the diameter of the singletons)
Each line with stars ends at the diameter at which the cluster is split. The actual diameter of the data set (corresponding to 1.00 in the banner) is displayed just below the banner.
The divisive coefficient defined above can be seen as the average width of the divisive banner.
where li is the length of the line containing the identifier of object i. Thus it can be seen as the counterpart of the agglomerative coefficient (AC):
DC = 0 No cluster structure
DC = 1 Clear cluster structure