Cluster Analysis




Cluster analysis is essentially concerned with the following general problem: Given a set of objects, find subsets, called clusters, which are both homogeneous and well separated. Thus clustering is a bicriterion problem - objects within the same cluster should resemble and objects in different clusters should differ from one another.

In the classification literature, two kinds of clustering algorithms are reported, namely partitioning and hierarchical methods.

A partitioning method classifies objects into a specified number of groups (say k), which together satisfy the following criteria:

These conditions imply that k n, where n is the total number of objects.

Hierarchical algorithms do not construct a single partition with k clusters, but they deal with all values of k in the interval [1, n]. Hierarchical clustering methods fall into two categories - Agglomerative Hierarchical Clustering and Hierarchical Divisive Clustering.

Hierarchical Agglomeration: Initially each object is considered as a cluster, then iteratively, two clusters are chosen according to some criterion and merged into a new cluster. The procedure is continued till all the objects belong to the same cluster. The number of iterations is equal to the number of objects minus 1.

Hierarchical Divisive Clustering: Initially all objects belong to the same cluster; then, iteratively, a cluster of the current partition is chosen according to a selection criterion and bipartitioned according to a local criterion (i.e., a criterion based only on information concerning the objects of the chosen clusters). The procedure continues till all clusters comprise single objects.

IDAMS contains two modules for cluster analysis: Clusfind and Typol. Clusfind comprises a library of six algorithms - three for partitioning a set of objects into a pre-assigned number of clusters, and three for hierarchical cluster analysis. Typol classifies a large number of objects into a pre-assigned number of clusters, and creates a classification variable, summarizing a large number of variables, both interval scale and categorical.