DIANA is a hierarchical clustering technique, but its main difference with the agglomerative method (Agnes) is that it constructs the hierarchy in the inverse order.

*Initially* (Step 0), there is one large cluster consisting of all *n*
objects. At each subsequent step, the largest available cluster is split into
two clusters until finally all clusters, comprise of single objects. Thus, the
hierarchy is built in *n-*1 steps.

In the first step of an agglomerative method, all possible fusions of two objects are considered leading to combinations. In the divisive method based on the same principle, there are possibilities to split the data into two clusters. This number is considerably larger than that in the case of an agglomerative method.

To avoid considering all possibilities, the algorithm proceeds as follows.

- Find the object, which has the highest average dissimilarity to
all other objects. This object initiates a new cluster– a sort of a
*splinter group.* - For each object
*i*outside the*splinter group*compute *D*= [average_{i}*d(i,j) j**R*_{splinter}] - [average_{ group}*d(i,j) j**R*_{splinter}_{ group}]- Find an object
*h*for which the difference*D*_{h}is the largest. If*D*_{h}is positive, then*h*is, on the average close to the splinter group. - Repeat
*Steps*2 and 3 until all differences*D*_{h}are negative. The data set is then split into two clusters. - Select the cluster with the largest diameter. The diameter of a
cluster is the largest dissimilarity between any two of its objects. Then
divide this cluster, following steps 1-4.
- Repeat
*Step*5 until all clusters contain only a single object.

*Divisive Coefficient (DC)*

For each object *i*, let *d ( i )* denote the diameter of the last cluster to which it
belongs (before being split off as a single object), divided by the diameter of
the whole data set.

The divisive coefficient *(DC)*, given by

indicates the strength of the clustering structure found by the algorithm.

*Graphical display*

The hierarchy obtained from Diana can be represented graphically by the
dissimilarity banner*.*

*Dissimilarity banner* consists of lines with stars and stripes, which repeat
the identifiers of the objects. The banner is read from left to right, but the
fixed scales above and below the banner range from 1.00 (corresponding to the
diameter of the entire data set) to 0.00 (corresponding to the diameter of the
singletons)

Each line with stars ends at the diameter at which the cluster is split. The actual diameter of the data set (corresponding to 1.00 in the banner) is displayed just below the banner.

The divisive coefficient defined above can be seen as the average width of the divisive banner.

where *l _{i}*
is the length of the line containing the identifier of object

DC= 0 No cluster structure

DC= 1 Clear cluster structure