### 7(1) Example of Cluster Analysis by Partitioning Around Medoids

 Research Question : Classification of 35 major countries according to the pattern of their collaboration with India in different fields of science. Methodology : Cluster Analysis using the algorithm PAM (Partitioning Around Medoids) Dataset : COOP.DAT
##### SYNTAX*
```\$RUN CLUSFIND
\$FILES
PRINT = PAM.LST
DICTIN = COOP.DIC
DATAIN = COOP.DAT
\$SETUP
CLUSTER ANALYSIS USIN PAM
IDVAR=V1 -
VARS=(V2-V12) -
ANALYSIS=PAM -
CMIN=4 -
PRINT=(DICT,DISS,GRAPH,TRACE)
```

-----------
Note: All options set at default values

##### EXTRACT FROM COMPUTER OUTPUT

After filtering       35 cases read from the input data file
MIN clusters= 4
Number of variables: 11
Number of objects: 35

*** Dissimilarity matrix ***

 1 2 3 4 5 6 7 8 9 10 11 12 13 1 0 2 24.03 .00 3 25.63 4.58 .00 4 26.31 8.56 6.14 .00 5 29.48 8.44 5.50 5.86 .00 6 28.95 7.60 4.56 5.77 3.01 .00 7 31.17 9.86 6.81 6.38 2.90 4.10 .00 8 32.56 10.94 7.97 7.58 3.65 4.48 2.49 .00 9 31.58 9.58 6.89 6.87 4.01 3.98 3.49 3.11 .00 10 32.56 10.89 8.02 7.31 4.16 4.87 2.21 1.92 2.80 .00 11 31.79 10.68 7.80 6.14 3.83 4.52 2.45 2.27 2.76 1.73 .00 12 32.65 10.82 8.16 7.66 4.16 4.68 2.94 1.88 2.82 1.26 2.05 .00 13 33.08 11.32 8.57 7.94 4.03 5.06 2.52 1.82 3.26 1.48 2.29 1.55 .00 14 33.19 11.39 8.59 7.90 4.30 5.11 2.74 1.71 3.05 1.15 2.03 1.20 .81 15 32.87 11.24 8.40 7.54 4.14 4.64 2.90 1.60 2.88 1.51 1.80 1.18 1.26 16 33.1 11.51 8.63 7.75 4.28 5.02 2.82 1.69 3.06 1.26 1.90 1.21 .96 17 33.3 11.60 8.77 7.91 4.51 5.20 3.01 1.63 3.13 1.28 1.98 1.17 1.14 18 33 11.44 8.62 7.52 4.62 5.15 3.00 2.21 2.92 1.10 1.75 1.34 1.49 19 33.06 11.60 8.77 7.81 4.42 5.04 3.28 2.26 3.19 1.79 2.19 1.36 1.47 20 33.41 11.69 8.90 8.13 4.67 5.22 3.19 2.03 3.15 1.47 2.25 1.19 1.11 21 33.69 11.96 9.13 8.32 4.79 5.56 3.25 2.05 3.37 1.55 2.35 1.47 1.07 22 33.3 11.67 8.86 7.92 4.60 5.21 3.18 2.13 3.07 1.48 2.03 1.25 1.18 23 33.49 11.76 8.99 8.23 4.55 5.33 3.20 1.98 3.17 1.67 2.37 1.46 .88 24 33.47 11.69 8.92 8.13 4.73 5.40 3.36 2.23 2.94 1.58 2.17 1.38 1.33 25 33.16 11.17 8.57 8.07 4.88 5.44 3.99 3.09 2.05 2.52 2.73 2.36 2.56 26 33.66 11.96 9.16 8.27 4.80 5.53 3.31 2.01 3.33 1.60 2.27 1.42 1.13 27 33.49 11.79 9.00 8.14 4.70 5.40 3.28 2.09 3.19 1.52 2.21 1.32 1.11 28 33.55 11.83 9.06 8.19 4.79 5.44 3.39 2.03 3.25 1.55 2.32 1.26 1.28 29 33.47 11.85 9.07 8.08 4.69 5.47 3.31 2.19 3.25 1.58 2.17 1.38 1.16 30 33.53 11.92 9.11 8.18 4.75 5.49 3.38 2.14 3.35 1.62 2.26 1.34 1.23 31 33.51 11.74 8.98 8.17 4.74 5.35 3.40 2.04 3.13 1.58 2.26 1.22 1.29 32 33.48 11.77 9.05 8.17 4.75 5.49 3.42 2.27 3.08 1.61 2.24 1.27 1.31 33 33.65 12.02 9.22 8.27 4.82 5.60 3.41 2.19 3.42 1.71 2.32 1.46 1.22 34 33.69 12.01 9.19 8.28 4.90 5.64 3.39 2.20 3.39 1.55 2.39 1.51 1.27 35 33.81 12.09 9.30 8.41 4.92 5.70 3.41 2.16 3.44 1.69 2.42 1.56 1.18

 14 15 16 17 18 19 20 21 22 23 24 25 26 14 0 15 0.85 .00 16 0.53 .60 .00 17 0.54 .73 .49 .00 18 1.09 1.16 .94 .92 .00 19 1.22 .97 .85 1.19 1.37 .00 20 0.86 .93 .67 .75 .96 .93 .00 21 0.65 1.07 .64 .63 1.14 1.11 .63 .00 22 0.78 .79 .53 .73 .91 .70 .49 .60 .00 23 0.87 1.08 .76 .87 1.20 1.09 .55 .63 .71 .00 24 0.91 1.15 .92 .87 1.02 1.23 .77 .70 .65 .80 .00 25 2.23 2.33 2.32 2.31 2.23 2.40 2.27 2.26 2.10 2.19 1.69 .00 26 0.75 1.01 .66 .57 1.08 1.10 .53 .28 .56 .55 .65 2.25 .00 27 0.75 .99 .67 .62 .90 1.08 .50 .45 .50 .54 .47 2.10 .35 28 0.83 .98 .70 .54 .93 1.07 .58 .57 .64 .70 .77 2.26 .47 29 0.81 1.01 .64 .73 .96 .87 .61 .49 .40 .60 .60 2.13 .43 30 0.82 1.00 .63 .67 1.05 .86 .60 .44 .47 .68 .67 2.24 .39 31 0.8 .90 .71 .56 .95 1.04 .57 .57 .56 .69 .60 2.08 .46 32 0.95 1.16 .86 .90 1.08 .94 .67 .67 .54 .71 .51 1.89 .60 33 0.84 1.07 .71 .75 1.19 .91 .67 .37 .50 .69 .71 2.25 .36 34 0.83 1.18 .75 .69 .96 1.16 .73 .44 .70 .76 .79 2.30 .49 35 0.8 1.16 .78 .72 1.19 1.16 .68 .20 .65 .64 .71 2.26 .24

 27 28 29 30 31 32 33 34 35 27 0 28 0.43 .00 29 0.34 .50 .00 30 0.38 .43 .24 .00 31 0.36 .25 .48 .43 .00 32 0.53 .65 .41 .50 .57 .00 33 0.48 .58 .31 .23 .55 .51 .00 34 0.47 .39 .47 .44 .51 .71 .52 .00 35 0.46 .55 .46 .42 .56 .63 .31 .43 .00

Number of representative objects: 4

Average distance, initial build : 36.916
Final average distance : 33.190

Cluster Medoid Size Objects

 Cluster Medoid Size Objects 1 1 1 1 2 3 2 2 3 3 5 5 4 5 6 7 8 4 22 27 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

Coordinates of medoids

``` 1   97.00 832.00 290.00 120.00 132.00  58.00 305.00 313.00 188.00  49.00  78.00
3   11.00 387.00  93.00  49.00  39.00  23.00  79.00  94.00  52.00   4.00  16.00
5    4.00 224.00  47.00  15.00  26.00   7.00  21.00  40.00  13.00   6.00  20.00
22    2.00  17.00   7.00   5.00    .00   1.00   7.00   4.00   2.00   1.00    .00```

Clustering vector

```1   2   2   3   3   3   3   3   4   4   4   4   4   4   4   4   4   4   4     4   4
4   4   4   4   4   4   4   4   4   4   4   4   4   4```

Clustering characteristics

Cluster 1 is an isolated singleton object 1 , separation 618.50

Cluster 2 an isolated L*-Cluster with diameter 119.02 and separation 188.28

Number of isolated clusters: 2

 Diameterofeachcluster .00 119.02 144.86 112.66 Diameterofeachcluster .00 119.02 144.86 112.66 Separationofeachcluster 618.50 188.28 39.08 39.08 Averagedistancetoeachmedoid .00 59.51 67.14 26.00 Maximumdistancetoeachmedoid .00 119.02 106.94 93.75

```                                   Silhouettes
0   .1   .2   .3   .4   .5   .6   .7   .8   .9    1
CLU  NEIG  S(I)   I  +----+----+----+----+----+----+----+----+----+----+
1    2   .00      1|                                                 |
|                                                 |
2    3   .53      2|**************************                       |
2    3   .49      3|************************                         |
|                                                 |
3    4   .59      5|*****************************                    |
3    4   .50      4|*************************                        |
3    2   .49      7|************************                         |
3    4   .27      6|*************                                    |
3    4  -.04      8|                                                 |
|                                                 |
4    3   .85     22|******************************************       |
4    3   .85     30|******************************************       |
4    3   .85     26|******************************************       |
4    3   .85     33|******************************************       |
4    3   .85     27|******************************************       |
4    3   .85     29|******************************************       |
4    3   .85     34|******************************************       |
4    3   .85     24|******************************************       |
4    3   .84     35|******************************************       |
4    3   .84     23|******************************************       |
4    3   .84     20|*****************************************        |
4    3   .83     28|*****************************************        |
4    3   .83     32|*****************************************        |
4    3   .83     19|*****************************************        |
4    3   .83     18|*****************************************        |
4    3   .83     31|*****************************************        |
4    3   .81     21|****************************************         |
4    3   .80     15|****************************************         |
4    3   .80     25|****************************************         |
4    3   .78     17|***************************************          |
4    3   .76     16|*************************************            |
4    3   .72     12|************************************             |
4    3   .62      9|******************************                   |
4    3   .60     14|******************************                   |
4    3   .56     13|****************************                     |
4    3   .50     11|*************************                        |
4    3   .20     10|**********                                       |
+----+----+----+----+----+----+----+----+----+----+
0   .1   .2   .3   .4   .5   .6   .7   .8   .9    1

Cluster 1 has average silhouette width     1.00
Cluster 2 has average silhouette width      .51
Cluster 3 has average silhouette width      .36
Cluster 4 has average silhouette width      .76

For the entire data set, the average silhouette width is .70
which indicates a strong structure was found.
```
##### INTERPRETATION
 IDAMS reports analysis specification: No. of Objects = 35 No. of Variables = 11 No. of Clusters = 4 Dissimilarity matrix is a matrix of unstandardized Euclidean distances among the objects. No. of medoids (representative objects) = Number of clusters = 4 Average dissimilarity for the solution found in the first part of the algorithm (BUILD) = 36.916 Average dissimilarity for the solution found in the second part of the algorithm (SWAP) = 33.190 These two values are defined as the average dissimilarity between each object and its most similar object (i.e., medoid).. Following information for each cluster: Its medoid (i.e. the most representative object) Cluster size (i.e. the number of objects in the cluster) Members of the cluster Cluster 1: Only one member: Object identified by Idcode (1), which is naturally its medoid. Cluster 2: Only two objects: Objects (2) and (3). Object (3) is the medoid of this cluster. Cluster 3: 5 objects identified by Idcode (4), (5), (6), (7), (8) Object (5) is the medoid of this cluster. Cluster 4: 27 objects Idcode: (9) to (35) Its medoid is object (22) Clustering Vector: The jth element of this vector is the number of the cluster to which the object j belongs. Cluster numbers are ordered from left to right. Cluster 1 is encountered first, followed by Cluster 2, and so on. Clustering Vector is 1 22 33333 444444444444444444444444444 Thus, object (1) belongs to Cluster 1 Objects (2) and (3) belong to Cluster 2 Objects (3), (4), (5), (6), (7) belongs to Cluster 3 The remaining objects belong to Cluster 4. Clustering Characteristics Cluster 1 is an isolate singleton object. Separation = 618.50 Cluster 2 is an isolate Lx cluster with Diameter = 113.02 and Separation =108.20 Diameters and separations of the cluster Silhouette Plot The following information is given for each object. CLU : Cluster number to which it belongs NEIG: The number of neighboring cluster. This is the second best cluster for an object. For example, the second best choice for object (3) in Cluster 2 is Cluster 3. The second best choice for object 7 in Cluster 3 is Cluster 2. The second best choice for object (5) in Cluster 2 is Cluster 4. S(I): The value of silhouette width. Silhouette width of objects Value close to 1 implies that the object is well classified. Value close to 0 implies that the object is arbitrarily assigned to the cluster. Values close to –1 indicate that the object is poorly classified. Idcode of Cluster A line, whose length is proportioned to S(I) if the value is positive. The proportion of blackness of the silhouette plot is an indicator of the quality of clustering. Average value of silhouette width for each cluster. Higher the value, tighter is the cluster. It can be easily seen from the silhouette plot that objects (1) and (8) have silhouette width close to zero, which implies that these objects have been arbitrarily assigned to their clusters. Leaving aside isolated cluster, Cluster (4) is tighter than other clusters. Silhouette Coefficient ( i.e,. Average silhouette width) = 0.70, which indicates reasonable structure.

### 7(2) Example of Fuzzy Cluster Analysis

 Research Question : Classification of 35 major countries according to the pattern of their collaboration with India in different fields of science. Methodology : Cluster Analysis using the algorithm FANNY (Fuzzy Analysis) Dataset : COOP.DAT
##### SYNTAX
```\$RUN CLUSFIND
\$FILES
PRINT = FANNY.LST
DICTIN = COOP.DIC
DATAIN = COOP.DAT
\$SETUP
CLUSTER ANALYSIS USIN FANNY
IDVAR=V1 -
VARS=(V2-V12) -
ANALYSIS=FANNY-
CMIN=4 -

PRINT=(DICT,DISS,GRAPH,TRACE)```

After filtering      35 cases read from the input data file

Number of variables: 11

Number of objects: 35

Number of clusters: 4

Dissimilarity matrix ***

 1 2 3 4 5 6 7 8 9 10 11 12 13 1 .00 2 622.58 .00 3 618.50 119.02 .00 4 820.37 231.17 222.52 .00 5 808.24 223.47 195.28 73.98 .00 6 857.14 263.25 265.64 89.49 106.94 .00 7 802.80 224.00 188.28 96.90 47.83 144.86 .00 8 910.02 311.13 299.44 106.02 106.93 94.87 127.87 .00 9 959.66 355.54 360.37 153.35 177.16 109.10 202.75 85.49 .00 10 924.78 320.30 317.46 114.90 130.26 101.26 147.72 39.08 66.00 .00 11 945.90 342.23 341.09 131.88 152.95 109.59 174.29 55.20 47.46 33.08 .00 12 974.73 366.41 376.05 169.22 192.46 129.64 215.55 97.19 39.89 70.55 49.63 .00 13 955.59 351.11 348.34 144.95 157.64 116.88 177.37 57.11 48.00 36.41 24.82 50.01 .00 14 960.30 354.77 352.74 147.55 162.80 122.87 181.92 61.20 49.72 38.70 22.05 46.98 13.34 15 985.67 380.33 382.65 173.68 194.36 136.11 217.12 93.43 38.50 72.83 46.71 29.85 42.61 16 978.47 373.97 373.24 165.46 183.63 132.54 204.67 81.59 40.45 60.51 37.07 35.76 29.02 17 984.63 379.10 379.89 171.30 190.61 139.47 211.55 87.98 45.41 66.41 41.42 32.82 36.69 18 1006.82 401.92 407.05 195.60 220.62 158.12 242.72 120.72 56.09 96.14 71.11 39.59 68.10 19 1009.79 404.27 409.26 200.11 222.06 157.74 244.57 121.90 55.05 99.41 75.23 42.74 68.76 20 1009.08 403.77 408.87 200.13 221.68 157.47 244.29 121.55 55.60 98.92 74.56 40.89 68.42 21 993.56 388.21 388.05 180.46 198.10 147.29 218.28 95.35 50.84 74.02 50.43 38.86 41.96 22 1007.23 401.43 405.42 195.98 217.49 156.70 239.36 116.52 54.41 93.75 68.72 39.48 62.88 23 1010.80 406.04 408.88 200.35 220.02 159.27 242.26 118.65 56.81 97.86 73.44 45.63 65.57 24 1011.74 406.47 409.96 200.81 221.91 162.01 244.26 121.07 61.07 98.87 72.72 44.15 68.21 25 1022.55 416.45 421.85 212.17 234.79 171.94 257.73 134.61 67.78 112.66 86.24 55.61 82.53 26 1007.02 401.49 403.62 194.75 214.32 157.78 235.56 111.85 56.70 90.52 65.43 41.45 58.75 27 1013.40 408.26 411.85 202.26 223.64 163.26 245.88 122.43 61.83 100.28 74.59 45.14 69.40 28 1019.60 413.62 418.72 208.28 230.92 169.35 252.88 129.16 66.26 106.59 81.54 49.07 76.44 1 1 2 3 4 5 6 7 8 9 10 11 12 13 29 1016.09 410.45 414.14 204.20 225.65 166.14 247.23 123.92 63.51 101.49 76.30 47.21 70.44 30 1013.44 407.85 411.22 201.63 222.70 163.76 244.39 120.90 62.08 98.67 73.38 44.84 67.76 31 1019.85 413.82 419.16 208.91 231.53 169.26 253.89 130.12 66.69 107.74 82.08 49.21 77.48 32 1019.45 412.86 418.66 209.08 230.98 169.23 252.75 129.73 65.89 106.51 81.67 47.82 76.08 33 1010.79 404.82 407.32 198.30 218.11 161.75 239.06 115.71 60.37 93.85 68.94 43.91 62.25 34 1011.70 406.35 408.36 198.43 219.37 163.29 240.10 116.92 61.25 94.55 70.28 46.09 63.84 35 1007.36 401.59 402.89 194.26 213.12 159.21 233.54 110.20 58.56 88.82 64.46 43.75 56.88 14 15 16 17 18 19 20 21 22 23 24 25 26 14 .00 15 38.83 .00 16 25.61 16.12 .00 17 30.63 14.00 10.95 .00 18 64.14 30.53 40.91 35.41 .00 19 66.35 30.36 42.02 38.60 16.37 .00 20 66.26 30.81 42.08 38.17 14.80 7.28 .00 21 37.38 17.35 15.91 11.87 32.20 33.15 32.95 .00 22 59.41 24.66 35.58 30.79 13.56 10.39 10.54 24.84 .00 23 63.66 29.58 39.03 35.51 18.95 11.70 10.49 28.53 11.00 .00 24 63.95 30.85 41.11 34.87 16.25 18.60 16.28 28.86 11.75 15.97 .00 25 78.03 44.23 55.70 49.76 26.31 25.57 25.79 44.08 24.70 26.25 19.29 .00 26 54.82 23.26 31.45 25.59 19.31 19.00 18.00 18.28 10.15 13.27 13.23 28.83 .00 27 65.52 31.58 41.94 35.93 13.82 15.52 13.34 29.83 10.25 12.73 5.74 20.42 12.65 28 72.23 37.80 48.75 42.30 15.39 16.28 15.87 36.91 15.91 16.97 15.20 19.97 19.60 29 66.44 33.14 42.97 36.99 14.83 15.03 14.46 30.22 10.49 12.61 9.49 20.25 12.77 30 63.47 30.40 40.17 33.82 14.83 16.06 15.07 27.37 9.59 13.30 7.75 22.14 10.15 31 73.12 38.14 49.79 43.19 16.52 16.94 16.31 37.87 16.22 18.11 13.45 17.12 20.40 32 72.32 38.50 49.27 43.17 17.15 15.43 14.04 37.05 14.97 16.40 13.86 17.78 19.72 33 57.85 26.89 35.11 28.90 19.31 18.95 18.65 21.17 10.34 14.49 11.79 26.40 5.10 34 59.20 29.03 36.21 30.12 17.06 20.12 20.30 23.02 13.08 16.00 14.59 27.48 9.59 35 52.43 24.31 30.08 24.19 23.30 23.35 23.19 15.36 14.59 17.66 17.23 31.73 6.00 27 28 29 30 31 32 33 34 35 27 .00 28 10.77 .00 29 5.39 8.66 .00 30 4.58 10.91 4.24 .00 31 10.10 4.47 9.33 11.18 .00 32 11.09 7.94 9.17 11.83 7.55 .00 33 11.14 17.26 9.95 7.55 17.89 17.18 .00 34 12.33 16.06 10.34 8.89 17.94 18.47 7.75 .00 35 17.09 23.19 16.16 13.82 24.25 23.47 6.78 9.90 .00

Iteration objective function

```     1              595.4648
2              581.4908
3              577.1000
4              572.3036
5              565.7405
6              558.7579
7              553.5726
8              549.4211
9              545.3386
10              541.7881
11              539.6334
12              538.6904
13              538.3286
14              538.1903
15              538.1367
16              538.1155
17              538.1072
18              538.1038
19              538.1031
20              538.1028
```

*** Fuzzy Clustering ***

```                     1           2           3           4
1           .3930       .2392       .1905       .1772
2           .6938       .1501       .0847       .0714
3           .6793       .1664       .0841       .0702
4           .1127       .6041       .1677       .1155
5           .0777       .7629       .0936       .0658
6           .1212       .4176       .2766       .1846
7           .1209       .6621       .1257       .0914
8           .0800       .3123       .4101       .1976
9           .0453       .1162       .5478       .2908
10           .0618       .2076       .5293       .2012
11           .0376       .1108       .6678       .1838
12           .0373       .0895       .5178       .3554
13           .0315       .0899       .7055       .1731
14           .0281       .0784       .7260       .1674
15           .0244       .0592       .5304       .3861
16           .0218       .0555       .6849       .2378
17           .0232       .0572       .6109       .3088
18           .0189       .0417       .1973       .7422
19           .0182       .0400       .1837       .7581
20           .0170       .0374       .1730       .7726
21           .0245       .0588       .4900       .4266
22           .0114       .0255       .1343       .8287
23           .0159       .0353       .1690       .7798
24           .0133       .0294       .1384       .8189
25           .0291       .0614       .2286       .6809
26           .0140       .0316       .1792       .7753
27           .0098       .0214       .0992       .8696
28           .0159       .0340       .1406       .8096
29           .0096       .0209       .0947       .8748
30           .0090       .0197       .0942       .8771
31           .0163       .0348       .1423       .8066
32           .0161       .0344       .1424       .8070
33           .0128       .0285       .1496       .8091
34           .0149       .0331       .1677       .7843
35           .0184       .0418       .2410       .6987
```

Partition coefficient of Dunn = .55

Its normalized version = .40

Closest hard clustering

``` Cluster    Size    Objects
1        3        1    2    3
2        4        4    5    6    7
3       11        8    9   10   11   12   13   14   15   16   17   21
4       17       18   19   20   22   23   24   25   26   27   28   29   30
31   32   33   34   35
```

Clustering vector

```1   1   1   2   2   2   2   3   3   3   3   3   3   3   3   3   3   4   4   4   3   4
4   4   4   4   4   4   4   4   4   4   4   4   4
```

Silhouettes

```                       0   .1   .2   .3   .4   .5   .6   .7   .8   .9    1
CLU  NEIG  S(I)   I  +----+----+----+----+----+----+----+----+----+----+
1    2   .25      1|************                                     |
1    2  -.36      2|                                                 |
1    2  -.41      3|                                                 |
|                                                 |
2    3   .55      5|***************************                      |
2    3   .49      7|************************                         |
2    3   .42      4|*********************                            |
2    3   .07      6|***                                              |
|                                                 |
3    4   .45     11|**********************                           |
3    4   .44     13|**********************                           |
3    4   .44     10|*********************                            |
3    4   .44     14|*********************                            |
3    2   .31      8|***************                                  |
3    4   .16      9|*******                                          |
3    4   .15     16|*******                                          |
3    4  -.06     17|                                                 |
3    4  -.08     12|                                                 |
3    4  -.23     15|                                                 |
3    4  -.32     21|                                                 |
|                                                 |
4    3   .82     29|*****************************************        |
4    3   .81     30|****************************************         |
4    3   .81     27|****************************************         |
4    3   .78     28|***************************************          |
4    3   .78     31|***************************************          |
4    3   .78     32|***************************************          |
4    3   .77     24|**************************************           |
4    3   .77     22|**************************************           |
4    3   .75     33|*************************************            |
4    3   .74     23|*************************************            |
4    3   .74     20|*************************************            |
4    3   .74     34|*************************************            |
4    3   .73     19|************************************             |
4    3   .72     26|***********************************              |
4    3   .71     18|***********************************              |
4    3   .68     25|*********************************                |
4    3   .65     35|********************************                 |
+----+----+----+----+----+----+----+----+----+----+
0   .1   .2   .3   .4   .5   .6   .7   .8   .9    1

Cluster 1 has average silhouette width     -.18
Cluster 2 has average silhouette width      .38
Cluster 3 has average silhouette width      .15
Cluster 4 has average silhouette width      .75

For the entire data set, the average silhouette width is .44
which indicates a strong structure was found.
```
##### INTERPRETATION
 IDAMS reports data specifications No. of Objects : 35 No. of Variables : 11 Number of Clusters: 4 Note that Fanny does not use any representative objects (medoids). Instead, the algorithm attempts to minimize the objective function defined earlier. The objective function is really a kind of total dispersion. The algorithm needed 30 iterations for convergence. Dissimilarity matrix: Matrix of Euclidean distances between the objects Objective Function   Ist iteration     : 595.4648   20th iteration  : 538.1028 Membership coefficient of different objects.            . Some objects have very large value for one cluster and small values for the other clusters. For example, object (22) belongs to Cluster 4 with membership coefficient = 0.8287, whereas object (8) belongs to Cluster 3 with membership function = 0.4101. Dunn’s partition coefficient        = 0.55 Normalized Dunn’s coefficient   = 0.40 Note: Normalized Dunn’s Coefficient value: 0 Þ Completely fuzzy clustering 1 Þ Completely hard clustering Closet hard cluster: Size and Idcodes of objects for each cluster Clustering vector The first three objects belong to Cluster 1. The next four objects belongs to Cluster 2. The next 11 objects belong to Cluster 3. The next 17 objects belong to Cluster 4. Silhouette plot of the closest hard clustering. Objects (2) and (3) of Cluster 1 and objects (15) and (21) of Cluster 3 have negative silhouette width. These objects are poorly classified. Objects (17) and (12) of Cluster 3 have zero silhouette width. Thes objects are arbitrarily assigned to their cluster. These objects could also have been assigned to their neighbor (viz. Cluster 4). The silhouette widths of objects in Cluster 4 are larger than those of objects of other clusters, which implies that they are better classified than their counterparts in other clusters.. Average silhouette width of Cluster 1 is negative (–0.18), which implies that this cluster is poorly constituted.. Average silhouette width of Cluster 3 is small (+0.15) , which implies arbitrariness of the cluster. Average silhouette width of Cluster 4 is quite large (+0.75) , which implies that this cluster is well constructed. Silhouette coefficient of the entire structure =0.44, which implies that clustering structure is rather weak.

### 7(3) Example of Cluster Analysis of Large Data Sets

 Research Question : Classification of a sample of research units in India according to the pattern of time spent on R & D inside the unit, administration., teaching and consultancy. Methodology : Cluster Analysis using the algorithm CLARA. Dataset : ICSOPRU2.DAT
##### SYNTAX*
```\$RUN CLUSFIND
\$FILES
PRINT = CLARA.LST
DICTIN = ICSOPRU2.DIC
DATAIN = ICSOPRU2.DAT
\$SETUP
INCLUDE V1=360
Cluster Analysis using CLARA
IDVAR=V2 -
VARS=(V22,V24,V25,V26) -
ANALYSIS=CLARA -
CMIN=3 -
PRINT=(DICT,GRAPH,TRACE)
-------------------------```

Note: All options set at default values

##### EXTRACT FROM COMPUTER OUTPUT
 Number of clusters: 3 Number of variables: 4 Number of objects: 100 Number of representative objects: 3 ```Drawing 5 samples of 46 objects. Sample number 1 Objects selected: 108 117 126 136 204 208 209 210 250 303 309 310 413 419 425 430 501 504 510 604 608 704 711 715 750 805 809 810 813 904 908 909 910 916 1203 1206 1207 1214 1219 1224 1227 1301 1302 1305 1311 1401 Average distance, initial build = 7.519 Average distance for this sample = 7.515 Results for the entire data set Total distance = 830.364 Average distance = 8.304 Cluster Size Medoid Coordinates of Medoids 1 44 17 45.00 5.29 4.00 6.71 2 35 70 64.00 6.67 1.67 2.67 3 21 21 78.00 6.00 .00 2.00 Average distance to each medoid 1 2 3 9.865 6.268 8.424 Maximum distance to each medoid 1 2 3 28.034 11.356 19.315 1 Maximum distance to a medoid divided by minimum distance to another medoid 1 2 3 1.429 .804 1.367 Sample number 2 Objects selected: 101 108 117 126 136 140 204 209 210 250 303 306 309 425 430 501 502 503 603 604 608 609 701 704 708 712 750 801 803 805 808 809 810 813 904 910 1203 1207 1208 1219 1224 1225 1227 1301 1307 1311 Average distance, initial build = 6.939 Average distance for this sample = 6.939 Results for the entire data set Total distance = 826.122 Average distance = 8.261 Cluster Size Medoid Coordinates of Medoids 1 45 33 46.67 8.33 .00 6.67 2 34 70 64.00 6.67 1.67 2.67 3 21 85 77.50 .00 .83 4.17 Average distance to each medoid 1 2 3 9.327 6.138 9.413 Maximum distance to each medoid 1 2 3 29.104 9.775 19.226 Maximum distance to a medoid divided by minimum distance to another medoid 1 2 3 1.622 .645 1.269 Sample number 3 Objects selected: 108 117 120 136 201 205 210 250 303 304 307 309 310 320 501 502 503 504 505 506 601 604 606 610 702 703 706 711 715 750 803 809 813 904 907 908 909 916 1206 1208 1219 1224 1227 1301 1305 1314 Average distance, initial build = 8.357 Average distance for this sample = 7.916 Results for the entire data set Total distance = 820.751 Average distance = 8.208 Cluster Size Medoid Coordinates of Medoids 1 44 33 46.67 8.33 .00 6.67 2 37 83 62.86 6.43 3.57 4.29 3 19 14 83.14 2.14 .86 2.00 Average distance to each medoid 1 2 3 9.306 6.427 9.132 Maximum distance to each medoid 1 2 3 29.104 10.805 16.742 Maximum distance to a medoid divided by minimum distance to another medoid 1 2 3 1.727 .641 .796 Sample number 4 Objects selected: 101 109 204 210 250 303 306 307 309 320 351 413 430 502 503 504 505 508 510 603 604 605 607 701 703 706 801 803 805 907 908 909 911 913 916 1203 1206 1207 1219 1224 1225 1308 1311 1312 1314 1401 Average distance, initial build = 7.831 Average distance for this sample = 7.714 Results for the entire data set Total distance = 867.108 Average distance = 8.671 Cluster Size Medoid Coordinates of Medoids 1 52 20 49.17 6.67 1.17 8.33 2 41 84 67.50 3.33 .83 2.50 3 7 98 92.00 2.00 .00 .00 Average distance to each medoid 1 2 3 9.882 7.621 5.820 Maximum distance to each medoid 1 2 3 30.031 15.065 11.051 Maximum distance to a medoid divided by minimum distance to another medoid 1 2 3 1.538 .772 .448 Sample number 5 Objects selected: 108 109 120 140 201 208 210 250 301 304 320 351 401 425 501 502 504 506 508 601 602 603 604 607 701 702 704 711 712 750 808 810 904 909 916 1206 1214 1219 1225 1301 1302 1305 1311 1312 1314 1318 Average distance, initial build = 8.613 Average distance for this sample = 8.613 Results for the entire data set Total distance = 834.124 Average distance = 8.341 Cluster Size Medoid Coordinates of Medoids 1 25 23 55.33 5.17 2.33 6.00 2 33 5 45.00 9.00 .00 5.00 3 42 44 70.00 6.14 1.14 5.00 Average distance to each medoid 1 2 3 6.785 8.858 8.862 Maximum distance to each medoid 1 2 3 12.492 29.051 27.442 Maximum distance to a medoid divided by minimum distance to another medoid 1 2 3 1.105 2.570 1.856 Final results Sample number 3 was selected, with objects: 108 117 120 136 201 205 210 250 303 304 307 309 310 320 501 502 503 504 505 506 601 604 606 610 702 703 706 711 715 750 803 809 813 904 907 908 909 916 1206 1208 1219 1224 1227 1301 1305 1314 Average distance for the entire dataset: 8.208 Clustering vector 1 1 2 1 1 2 3 1 1 2 1 2 1 3 1 1 1 1 2 1 3 1 2 3 1 2 2 1 1 1 1 1 1 2 1 3 1 1 1 1 1 2 2 2 2 2 2 1 2 1 2 2 1 2 1 1 1 2 2 1 1 2 3 1 3 3 1 2 1 2 1 2 2 2 1 1 3 2 1 2 2 1 2 2 3 3 3 2 3 3 2 3 2 3 2 1 3 3 3 2 Cluster Size Medoid Objects 1 44 502 101 108 117 120 140 201 205 209 250 301 303 304 307 310 351 419 425 427 430 501 502 504 506 507 508 510 601 608 610 703 706 708 710 715 750 803 808 810 814 908 909 913 1204 1311 2 37 1206 109 126 204 208 306 320 401 413 503 602 603 604 605 606 607 609 701 702 704 711 712 801 809 813 902 904 907 911 916 1203 1206 1207 1224 1301 1305 1308 1401 3 19 210 136 210 309 322 505 802 805 806 910 1208 1214 1219 1225 1227 1302 1307 1312 1314 1318 Average distance to each medoid 1 2 3 9.306 6.427 9.132 Maximum distance to each medoid 1 2 3 29.104 10.805 16.742 Maximum distance to a medoid divided by minimum distance to another medoid 1 2 3 1.727 .641 .796 Silhouettes for the selected sample 0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1 CLU NEIG S(I) I +----+----+----+----+----+----+----+----+----+----+ 1 2 .54 304|************************** | 1 2 .52 120|************************** | 1 2 .51 310|************************* | 1 2 .50 750|************************* | 1 2 .49 501|************************ | 1 2 .49 502|************************ | 1 2 .48 303|************************ | 1 2 .45 205|********************** | 1 2 .45 908|********************** | 1 2 .44 108|********************** | 1 2 .44 706|********************* | 1 2 .43 601|********************* | 1 2 .40 117|******************** | 1 2 .40 201|******************** | 1 2 .39 250|******************* | 1 2 .38 307|****************** | 1 2 .35 504|***************** | 1 2 .33 506|**************** | 1 2 .28 909|************** | 1 2 .13 715|****** | 1 2 .02 703|* | 1 2 -.04 610| | 1 2 -.09 803| | | | 2 3 .71 813|*********************************** | 2 1 .70 809|*********************************** | 2 1 .70 1206|*********************************** | 2 3 .68 904|********************************** | 2 3 .64 916|******************************* | 2 1 .57 907|**************************** | 2 1 .56 606|**************************** | 2 3 .55 711|*************************** | 2 3 .52 503|************************** | 2 3 .51 1224|************************* | 2 3 .47 604|*********************** | 2 1 .43 702|********************* | 2 1 .26 1301|************ | 2 1 .22 320|*********** | 2 3 .22 1305|********** | | | 3 2 .61 1314|****************************** | 3 2 .60 1227|****************************** | 3 2 .55 505|*************************** | 3 2 .55 210|*************************** | 3 2 .47 136|*********************** | 3 2 .26 1208|************* | 3 2 .26 309|************* | 3 2 -.15 1219| | +----+----+----+----+----+----+----+----+----+----+ 0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1 Cluster 1 has average silhouette width .36 Cluster 2 has average silhouette width .52 Cluster 3 has average silhouette width .39 For the selected sample, the average silhouette width is .42 which indicates a strong structure was found. ```
##### INTERPRETATION
 IDAMS reports analysis specifications: No. of object : 100 No. of Variables : 4 No. of Cluster : 3 No. of medoids : 3 (i.e. equal to the number of clusters) The algorithm draws 5 random samples of 40 + 3k objects, where k = the number of samples (= 46 objects) (a) Sample 1: Results for sample 1: List of objects in the sample. The average distance from BUILD (initial average distance) = 7.519. The average distance from SWAP (i.e. final average distance) = 7.515 These values are the average distances between each object of the sample and its most similar representative object. Results for the entire data set Total distance = 830.364 Average distance = 8.304 Following information for each cluster: Size of the cluster in the entire data set. Its most representative object (medoid) The coordinates of the medoid. Average distance to each medoid. Maximum distance to each medoid. Maximum distance to a medoid divided by the minimum distance of the medoid to another medoid. This value gives on idea of the isolation of the cluster. Similar information for each of the remaining four samples. Average distance for the entire data set by BUILD. Average distance Sample 1 8.304 Sample 2 8.261 Sample 3 7.916 Sample 4 8.671 Sample 5 8.341 Sample 3 has the lowest average distance. None of these cluster is compact. This observation is also vindicated by the silhouette plot and the value of the silhouette coefficient (i.e. average silhouette width, which is only 0.42). Final Results The list of objects in the selected sample. Average distance for the entire data set = 8.208. Clustering Vector is interpreted as follows: First two objects belong to Cluster 1. The third object belongs to Cluster 2. The fourth and fifth research objects belong to Cluster 3. The sixth object belongs to Cluster 2, and so on.. Clustering characteristics of the final partition of the data set. For each cluster, the following information is pointed: Cluster # Size (no. of objects in the cluster) List of objects. Average distance of objects in a cluster to their medoid. Maximum distance of objects in a cluster to their medoid. Maximum distance to a medoid divided by the minimum distance to another medoid. Cluster 1 has the highest value, while Cluster 3 has the lowest value, which implies that Cluster 2 is tighter than Cluster 1 and Cluster 3. Note, that all the values of this ratio are greater than 0.5. Hence, the Silhouette plot Cluster 1 The silhouette value of objects 703, 610 and 803 are close to zero, implying that these objects are arbitrarily assigned to cluster 1. Cluster 2 None of the objects have values close to zero. Cluster 3 Object (1219) has negative silhouette value = -.1219, which is close to zero. This implies that this object is arbitrarily assigned to its cluster. Average silhouette width of cluster 2 is larger that that of the other two clusters. Average silhouette width = 0.42 which implies that clustering structure is weak and could possibly be artificial.

### 7(4) Example of Hierarchical Agglomerative Cluster Analysis

 Research Question : Classification of eleven countries according to their publication pattern in different sub fields of chemistry Methodology : Cluster Analysis using the algorithm Agnes (Agglomerative Nesting) Dataset : CHEM.DAT
##### SYNTAX
```\$RUN CLUSFIND
\$FILES
PRINT = AGNES.LST
DICTIN = CHEM.DIC
DATAIN = CHEM.DAT
\$SETUP
LUSFIND PROGRAM AGNES
STANDARDIZE -
IDVAR=V1 -
VARS=(V2-V10) -
ANALYSIS=AGNES-
PRINT=(DICT,DISSIM, GRAPH, TRACE,VNAM) ```
##### EXTRACT FROM COMPUTER OUTPUT

After filtering 11 cases read from the input data file

Number of variables: 9

Number of objects: 11

*** Dissimilarity matrix ***

 x 1 2 3 4 5 6 7 8 9 10 11 1 0 2 4.04 .00 3 8.8 7.86 .00 4 3.2 4.77 6.78 .00 5 4.15 5.45 7.19 2.12 .00 6 3.9 3.75 6.12 3.32 4.12 .00 7 6.35 5.00 7.58 4.68 4.22 5.87 .00 8 1.08 4.00 8.32 2.92 3.43 3.65 5.86 .00 9 4.83 5.67 6.66 1.98 2.96 4.67 4.03 4.60 .00 10 2.56 4.55 8.36 3.21 3.75 3.40 5.97 2.19 4.67 .00 11 6.48 6.54 10.06 7.40 6.94 7.52 7.14 5.80 8.01 5.85 .00

Final ordering of objects and dissimilarities between them

 Objects 1 8 10 6 4 9 Dissimiliarities 1.082 2.373 3.650 3.905 1.983 Objects 9 5 2 7 11 3 Dissimiliarities 2.542 4.604 5.247 6.854 7.772

1

```Dissimilarity banner

Dissimilarity banner

0    .08   .16   .24   .32   .40   .48   .56   .64   .72   .80   .88   .96 1
+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+--+--
1-  1-  1-  1-  1-  1-  1-  1-  1-  1-  1-  1-  1-  1-  1-  1-  1-
********************************************************************
8-  8-  8-  8-  8-  8-  8-  8-  8-  8-  8-  8-  8-  8-  8-  8-  8-
********************************************************
10- 10- 10- 10- 10- 10- 10- 10- 10- 10- 10- 10- 10- 10-
*******************************************
6-  6-  6-  6-  6-  6-  6-  6-  6-  6-  6
*****************************************
4-  4-  4-  4-  4-  4-  4-  4-  4-  4-  4-  4-  4-  4-  4
***********************************************************
9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9
******************************************************
5-  5-  5-  5-  5-  5-  5-  5-  5-  5-  5-  5-  5-
**********************************
2-  2-  2-  2-  2-  2-  2-  2-
****************************
7-  7-  7-  7-  7-  7-  7-
************
11- 11- 11-
***
3
+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+--+--
0    .08   .16   .24   .32   .40   .48   .56   .64   .72   .80   .88   .96 1

The actual highest level is 7.772362

The agglomerative coefficient of this data set is .54
```

### 7(5) Example of Hierarchical Divisive Cluster Analysis

 Research Question : Classification of eleven countries according to their publication pattern in different sub fields of chemistry Methodology : Cluster Analysis using the algorithm Diana (Divisive Analysis) Dataset : CHEM.DAT
##### SYNTAX*
```\$RUN CLUSFIND
\$FILES
PRINT = DIANA.LST
DICTIN = CHEM.DIC
DATAIN = CHEM.DAT
\$SETUP
CLUSTERING WITH PROGRAM DIANA
STANDARDIZE -
IDVAR=V1 -
VARS=(V2-V10) -
ANALYSIS=DIANA-
PRINT=(DICT,DISSIM, GRAPH, TRACE,VNAM)
---------```

Note: All options set at default values.

##### EXTRACT FROM COMPUTER OUTPUT

After filtering 11 cases read from the input data file

Number of variables: 9

Number of objects: 11

*** Dissimilarity matrix ***

 x 1 2 3 4 5 6 7 8 9 10 11 1 0 2 4.04 .00 3 8.8 7.86 .00 4 3.2 4.77 6.78 .00 5 4.15 5.45 7.19 2.12 .00 6 3.9 3.75 6.12 3.32 4.12 .00 7 6.35 5.00 7.58 4.68 4.22 5.87 .00 8 1.08 4.00 8.32 2.92 3.43 3.65 5.86 .00 9 4.83 5.67 6.66 1.98 2.96 4.67 4.03 4.60 .00 10 2.56 4.55 8.36 3.21 3.75 3.40 5.97 2.19 4.67 .00 11 6.48 6.54 10.06 7.40 6.94 7.52 7.14 5.80 8.01 5.85 .00

At the first step the 11 objects are divided into groups of 10 and 1

Final ordering of objects and diameters of the clusters

``` Objects      1           8          10           6            2           4
Diameters      1.082    2.560      3.904     4.546       6.352

Objects      9           5           7           11            3
Diameters      2.963     4.679    8.014      10.060
```

Dissimilarity banner

```   1    .92   .84   .76   .68   .60   .52   .44   .36   .28   .20   .12   .04 0
--+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+--+
1-  1-  1-  1-  1-  1-  1-  1-  1-  1-  1-  1-  1-  1-  1-  1-  1-
*********************************************************************
8-  8-  8-  8-  8-  8-  8-  8-  8-  8-  8-  8-  8-  8-  8-  8-  8-
**********************************************************
10- 10- 10- 10- 10- 10- 10- 10- 10- 10- 10- 10- 10- 10- 1
************************************************
6-  6-  6-  6-  6-  6-  6-  6-  6-  6-  6-  6-
********************************************
2-  2-  2-  2-  2-  2-  2-  2-  2-  2-  2-
******************************
4-  4-  4-  4-  4-  4-  4-  4-  4-  4-  4-  4-  4-  4-  4-  4
***************************************************************
9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9
*******************************************************
5-  5-  5-  5-  5-  5-  5-  5-  5-  5-  5-  5-  5-  5
*******************************************
7-  7-  7-  7-  7-  7-  7-  7-  7-  7-  7
******************
11- 11- 11- 11- 1
***
3
--+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+--+
1    .92   .84   .76   .68   .60   .52   .44   .36   .28   .20   .12   .04 0

```

The actual diameter of this data set is 10.060

The divisive coefficient of this data set is .61

##### INTERPRETATION
 IDAMS reports analysis specifications. No of cases read from the input data file =11 No. of variables = 9     No. of Objects = 11 Dissimilarity matrix This is a matrix of normalized Euclidean distances between the objects. It can be easily seen that dissimilarity between objects 1 and 8 Dissimilarity (1,8) = 1.08 is the lowest Dissimilarity (11,3) = 10.06 is the largest. Final ordering of objects and diameters of the clusters ``` Objects 1 8 10 6 2 4 Diameters 1.082 2.560 3.904 4.546 6.352 Objects 9 5 7 11 3 Diameters 2.963 4.679 8.014 10.060 ``` The largest diameter is 10.060, which stands between (11) and (3). This means that the whole data would be split at the level 10.060, yielding, a singleton cluster with the object (3) on the left and a cluster with objects (1, 8, 10, 6, 2, 4, 9, 5, 7, 11) on the right. Thus in the first step, we get two clusters: (3) (1, 8, 10, 6, 2, 4, 9, 5, 7, 11) The second largest diameter is 8.014, which stands between (7) and (11). This means that Cluster (1, 8, 10, 6, 2, 4, 9, 5, 7, 11) would be split into two clusters: (11) (1, 8, 10, 6, 2, 4, 9, 5, 7, 11). The third largest diameter is 6.352, which stands between (2) and (4), indicating that Cluster (1, 8, 10, 6, 2, 4, 9, 5, 7, 11) would be divided into two clusters: (1, 8, 10, 6, 2) (4, 9, 5, 7). The fourth largest diameter is 4.679 which stands between (5) and (7). This mean that Cluster (4, 9, 5, 7) would be divided into two clusters: (4, 9, 5) (7) The fifth largest diameter is 4.546 which stands between (6) and (2). This means that Cluster (1, 8, 10, 6, 2) would be divided into two clusters: (1, 8, 10, 6) (2). The sixth largest diameter is 3.904, which stands between (10) and (6). This means that Cluster (1, 8, 10, 6) should be divided into two clusters: (1, 8, 10) (6). This divisive process is continued till we get 11 singleton clusters. Dissimilarity Banner The dissimilarity banner is similar to that of Agnes (Example 7.4), but it floats in the opposite direction. Also, the scales that surround the banner are plotted differently, since they decrease from 1 to 0. Here, 0 indicates a zero diameter and 1 stands for the diameter of the entire dataset, which is equal to 10.060. Diameter = 0 corresponds to singletons. The overall width of the banner reflects the strength of the clustering. When the diameter of the entire data set is much larger than that of the diameter of individual clusters, the banner is wide. The divisive coefficient DC is the average width of the banner. DC = 0.61 which indicates good clustering.

### 7(6) Example of Cluster Analysis of Binary Data

 Research Question : Classification of 33 major academic institutions in India according to priorities given to different scientific fields. Methodology : Cluster Analysis using the algorithm MONA (Monothetic Analysis) Dataset : MONA.DAT
##### SYNTAX*
```\$RUN CLUSFIND
\$FILES
PRINT = MONA.LST
\$SETUP
CLUSTER ANALYSIS USING MONA
IDVAR=V1 -
VARS=(V2-V9) -
ANALYSIS=MONA -
CMAX=5 -
PRINT=(DICT,DISS,GRAPH,TRACE,VNAM)
-------------------------------```

Note; All options set at default values.

##### EXTRACT FROM COMPUTER OUTPUT
 After filtering 33 cases read from the input data file Number of variables: 8 Number of objects: 33 Step number 1 ```Cluster 1 3 4 6 7 8 9 10 12 13 18 23 25 26 32 2 5 11 14 15 16 17 19 20 21 22 24 27 28 29 30 31 33``` is divided into 15 and 18 objects, using variable LIF Step number 2 ```Cluster 1 3 4 7 8 10 12 13 18 23 26 32 6 9 25 is divided into 12 and 3 objects, using variable MED``` ```Cluster 2 14 15 16 19 20 21 27 28 29 30 31 33 5 11 17 22 24 is divided into 13 and 5 objects, using variable ESP``` Step number 3 ```Cluster 1 3 4 7 8 10 13 18 23 12 26 32 is divided into 9 and 3 objects, using variable MAT``` ```Cluster 6 9 25 is divided into 2 and 1 objects, using variable MAT``` ```Cluster 2 16 20 21 28 29 30 31 33 14 15 19 27 is divided into 9 and 4 objects, using variable MED``` ```Cluster 5 17 22 24 11 is divided into 4 and 1 objects, using variable PHY``` Step number 4 ```Cluster 1 3 4 7 8 10 13 18 23 is divided into 5 and 4 objects, using variable ESP``` ```Cluster 12 32 26 is divided into 2 and 1 objects, using variable PHY``` ```Cluster 6 9 Cannot be separated by the remaining variables.``` ```Cluster 2 16 20 29 21 28 30 31 33 is divided into 4 and 5 objects, using variable PHY``` ```Cluster 14 15 27 19 is divided into 3 and 1 objects, using variable PHY``` ```Cluster 5 22 24 17 is divided into 3 and 1 objects, using variable MAT``` Step number 5 ```Cluster 1 3 7 8 4 is divided into 4 and 1 objects, using variable CHE``` ```Cluster 10 23 13 18 is divided into 2 and 2 objects, using variable ENG``` ```Cluster 12 32 Cannot be separated by the remaining variables.``` ```Cluster 2 29 16 20 is divided into 2 and 2 objects, using variable CHE``` ```Cluster 21 28 30 31 33 is divided into 1 and 4 objects, using variable CHE``` ```Cluster 14 27 15 is divided into 2 and 1 objects, using variable CHE``` ```Cluster 5 22 24 is divided into 2 and 1 objects, using variable CHE``` Step number 6 ```Cluster 1 3 7 8 Cannot be separated by the remaining variables.``` ```Cluster 10 23 Cannot be separated by the remaining variables.``` ```Cluster 13 18 Cannot be separated by the remaining variables.``` ```Cluster 2 29 is divided into 1 and 1 objects, using variable ENG``` ```Cluster 16 20 Cannot be separated by the remaining variables.``` `1` ```Cluster 28 30 33 31 is divided into 3 and 1 objects, using variable MAT``` ```Cluster 14 27 is divided into 1 and 1 objects, using variable AGR``` ```Cluster 5 22 is divided into 1 and 1 objects, using variable MED``` Step number 7 ```Cluster 28 30 33 is divided into 1 and 2 objects, using variable AGR``` Step number 8 ```Cluster 30 33 Cannot be separated by the remaining variables. ``` Final Ordering of Objects ``` 1 3 7 8 4 10 23 13 18 12 32 26 6 Step 5 4 5 3 4 2 By CHE ESP ENG MAT PHY MED 6 9 25 2 29 16 20 21 28 30 33 31 14 Step 3 1 6 5 4 5 7 6 3 By MAT LIF ENG CHE PHY CHE AGR MAT MED 14 27 15 19 5 22 24 17 11 Step 6 5 4 2 6 5 4 3 By AGR CHE PHY ESP MED CHE MAT PHY ``` Separation Plot ``` 0 1 2 3 4 5 6 7 1- 1- 1- 1- 1- 1- 1- 1- 1- 1- 1- 1- 1- 3- 3- 3- 3- 3- 3- 3- 3- 3- 3- 3- 3- 3- 7- 7- 7- 7- 7- 7- 7- 7- 7- 7- 7- 7- 7- 8- 8- 8- 8- 8- 8- 8- 8- 8- 8- 8- 8- 8- CHE **************************************************** 4- 4- 4- 4- 4- 4- 4- 4- 4- 4- 4- 4- 4- ESP ****************************************** 10- 10- 10- 10- 10- 10- 10- 10- 10- 10- 10- 10- 10- 23- 23- 23- 23- 23- 23- 23- 23- 23- 23- 23- 23- 23- ENG **************************************************** 13- 13- 13- 13- 13- 13- 13- 13- 13- 13- 13- 13- 13- 18- 18- 18- 18- 18- 18- 18- 18- 18- 18- 18- 18- 18- MAT ******************************** 12- 12- 12- 12- 12- 12- 12- 12- 12- 12- 1 32- 32- 32- 32- 32- 32- 32- 32- 32- 32- 3 PHY ****************************************** 26- 26- 26- 26- 26- 26- 26- 26- 26- 26- 2 MED ********************** 6- 6- 6- 6- 6- 6- 6- 6- 9- 9- 9- 9- 9- 9- 9- 9- MAT ******************************** 25- 25- 25- 25- 25- 25- 25- 25- LIF ************ 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- ENG ************************************************************** 29- 29- 29- 29- 29- 29- 29- 29- 29- 29- 29- 29- 29- 29- 29- 2 CHE **************************************************** 16- 16- 16- 16- 16- 16- 16- 16- 16- 16- 16- 16- 16- 20- 20- 20- 20- 20- 20- 20- 20- 20- 20- 20- 20- 20- PHY ****************************************** 21- 21- 21- 21- 21- 21- 21- 21- 21- 21- 21- 21- 21- CHE **************************************************** 28- 28- 28- 28- 28- 28- 28- 28- 28- 28- 28- 28- 28- 28- 28- 28- 28- 28- AGR ************************************************************************ 30- 30- 30- 30- 30- 30- 30- 30- 30- 30- 30- 30- 30- 30- 30- 30- 30- 30- 33- 33- 33- 33- 33- 33- 33- 33- 33- 33- 33- 33- 33- 33- 33- 33- 33- 33- MAT ************************************************************** 31- 31- 31- 31- 31- 31- 31- 31- 31- 31- 31- 31- 31- 31- 31- 3 MED ******************************** 14- 14- 14- 14- 14- 14- 14- 14- 14- 14- 14- 14- 14- 14- 14- 1 AGR ************************************************************** 27- 27- 27- 27- 27- 27- 27- 27- 27- 27- 27- 27- 27- 27- 27- 2 CHE **************************************************** 15- 15- 15- 15- 15- 15- 15- 15- 15- 15- 15- 15- 15- PHY ****************************************** 19- 19- 19- 19- 19- 19- 19- 19- 19- 19- 1 ESP ********************** 5- 5- 5- 5- 5- 5- 5- 5- 5- 5- 5- 5- 5- 5- 5- MED ************************************************************** 22- 22- 22- 22- 22- 22- 22- 22- 22- 22- 22- 22- 22- 22- 22- 2 CHE **************************************************** 24- 24- 24- 24- 24- 24- 24- 24- 24- 24- 24- 24- 24- MAT ****************************************** 17- 17- 17- 17- 17- 17- 17- 17- 17- 17- 1 PHY ******************************** 11- 11- 11- 11- 11- 11- 11- 11- 0 1 2 3 4 5 6 7 ```
##### INTERPRETATION
 IDAMS reports analysis specifications No of cases read from the input data file = 33 No. of Variables = 8 No. of Objects = 33 The whole sample is successively divided into clusters in eight steps. The results are represented in Figure 1 in the form of a hierarchical tree. Note that in monothetic analysis, only one variable is taken at a time for hierarchical clustering of objects. Hence, the name Monothetic clustering. In the tree, the variable used for a split is indicated in the figure. Final Ordering of Objects The first row shows the sequence of objects and the second row shows the separation steps. For example, the first step appears between (2) and (25). All the objects starting from (25) are separated at the first step from the other objects. This separation is carried out, using the variable LIFE. The second step appears between (26) and (6). The objects starting from (6) to (25) are separated at step 2. This separation is carried out using the variable MED, and so on. Separation Banner Each object of the data set corresponds to a horizontal line in the banner. The horizontal lines are ordered in the same way as the first row of the final ordering of objects. The end of a row of stars **** indicates a separation between clusters. If two are more lines representing objects are stuck together, it means that the objects cannot be separated. Objects (1, 3, 7, 8) are stuck together and hence cannot be further split. The length of the row of stars is proportional to the step number at which the separation is carried out. These objects were separated from the other object (4) at step 5, using the variable CHE. When the row of an object does not continue to the right hand side of the banner, it means that at the corresponding step it becomes a singleton cluster. For example, object (11) becomes a singleton at step 3, using the variable PHY. It is important to note that the banner of MONA cannot be used to assess the quality of clustering, because the length of the row of stars is proportioned to the number of the separation step, and not to the tightness of the clusters

### 7(7) Example of Construction of a Typology

 Research Question : Construct a typology of academic scientists according to the pattern of their involvement in different activities, and identify their main characteristics. Methodology : Classification ,f using the IDAMS module Typol. Dataset : TYPE.DAT
##### SYNTAX*
```\$RUN TYPOL
\$FILES
PRINT = typology.lst
DICTIN = anju.dic
DATAIN = anju.dat
\$SETUP
EXCLUDE V1=1220,2049,2055,2074,2075,5016
AQNTVARS=(V2-V8) -
PQNTVARS=(V13)-
PQLTVARS=(V9,v10,v14, v15) -
INITIAL=RANDOM NCASES=1073 -
DTYPE=EUCLID -
INIGROUP=5 -
FINGROUP=3-
PRINT=(CDICT,GRAP,ROWP,DIST)```
`--------------------------------------------`

Note: All options set at default values

##### EXTRACT FROM COMPUTER OUTPUT
 Number of initial groups: 5 Number of final groups: 3 Initial configuration is a random sample Number of cases: 1073 Maximum number of iterations: 5 No standardization of active variables Type of distance is 'Euclidean' Regrouping is based on minimum displacement Print the graphic of profiles Print row % for qual. variables categories Print table of distances and displacements for each regrouping Print all resulting typologies Active quantitative variables      V2 V3 V4 V5 V6 V7 V8 Passive quantitative variables      V13 After filtering 1067 cases read from the input file 4 cases contained illegal characters and were treated according to BADDATA specification The distances and displacements are computed on non-standardized variables % of explained variance from one iteration to another ```Iteration number Mean EV Image 1 .345 *** 2 . 358 **** 3 . 359 **** ``` Characteristics of distances by groups ``` Group no. N Mean SD 1 217. 58.844 81.419 2 94. 73.033 71.358 3 204. 39.833 39.680 4 174. 68.739 81.807 5 354. 41.761 43.044 1 Total count Mean SD 1043. 52.257 63.657 ``` ```Var seq Name Mean S. D. Weight 1 v262:teaching 42.02 18.00 1.00 2 v263:research 22.40 12.28 1.00 3 v264:supervsn 16.14 12.01 1.00 4 v265:lab-dev 5.76 7.04 1.00 5 v266:admin 7.78 9.30 1.00 6 v267:extension 2.94 6.30 1.00 7 v268:profess 2.96 4.23 1.00 8 v344:#doc students 2.00 1.96 .00 ``` Description of resulting typology ``` Group number 1 2 3 4 5 Total cases 1000 Proportion of cases 208 90 195 166 339 Explained Grand variance mean 1 767 ******** 42.02 v262:teaching 25.63 23.28 68.32 31.83 46.88 8.12 9.55 9.63 10.90 6.80 2 464 ***** 22.40 v263:research 22.09 16.30 17.19 40.63 18.26 8.76 7.06 9.14 12.57 7.22 3 519 ***** 16.14 v264:supervsn 30.88 14.22 5.09 10.48 16.77 11.14 6.85 5.26 8.20 8.17 4 83 * 5.76 v265:lab-dev 5.50 10.96 2.99 5.25 6.38 5.76 12.12 4.11 5.97 6.88 5 564 ****** 7.78 v266:admin 6.50 29.74 3.74 5.45 6.21 4.94 11.70 5.12 5.21 5.59 6 23 2.94 v267:extension 4.50 2.12 1.87 3.51 2.54 8.28 4.00 5.13 7.32 5.16 7 96 * 2.96 v268:profess 4.91 3.38 .80 2.86 2.94 4.24 4.22 2.08 4.09 4.60 8 164 ** 2.00 v344:#doc students 3.24 2.04 .75 1.79 2.05 1.98 1.69 1.36 2.06 1.77 9 70 * 34.2 v204:rank CODE=0001 48.4 54.3 14.2 33.9 31.9 100.0 29.4 14.3 8.1 16.4 31.6 10 3 34.5 v204:rank CODE=0002 36.4 27.7 33.3 32.8 36.7 100.0 21.9 7.2 18.8 15.8 36.1 11 90 * 29.5 v204:rank CODE=0003 12.0 14.9 52.5 32.2 29.7 100.0 8.4 4.5 34.6 18.1 34.1 12 75 * 10.3 v217:head? CODE=0001 8.8 36.2 5.9 5.2 9.3 100.0 17.8 31.7 11.2 8.4 30.8 13 76 * 88.4 v217:head? CODE=0002 90.8 60.6 92.6 93.1 89.5 100.0 21.4 6.2 20.4 17.5 34.3 14 63 * 35.1 sv:inst type CODE=0001 21.7 22.3 53.4 25.9 40.7 100.0 12.8 5.7 29.7 12.2 39.3 15 59 * 28.3 sv:inst type CODE=0002 32.7 47.9 8.3 31.6 30.2 100. 24.1 15.2 5.7 18.6 36.2 16 110 * 5.7 sv:inst type CODE=0003 19.4 1.1 .0 8.6 .3 100.0 71.2 1.7 .0 25.3 1.7 17 9 31.0 sv:inst type CODE=0004 26.3 28.7 38.2 33.9 28.8 100.0 17.6 8.3 24.1 18.2 31.5 18 1 59.3 :field CODE=0001 61.8 55.3 57.4 59.2 59.9 100.0 21.7 8.4 18.9 16.6 34.3 19 1 39.3 :field CODE=0002 37.8 42.6 40.2 39.7 38.7 100.0 20.0 9.7 19.9 16.7 33.4 ``` Variables explaining 80% of the variance ``` Var seq Names Expl. Var 1 v262:teaching 767 5 v266:admin 564 3 v264:supervsn 519 2 v263:research 464 ``` Expl. Var = amount of variance explained by one variable Total variance = overall variance explained by the active variables Mean variance explained by active variables = 335 Mean variance explained by all variables = 239 Mean variance explained by the variables which explain 80% of the total variance = 578 Percentage of variables = 21.1 Mean variance explained by those variables which explain 80% of the total variance before regrouping = 578 Displacements Square roots of (computed on weighted variables) and distances ``` Groups Numbers 1 2 3 4 2 37.7 3.362 3 62.0 49.9 4.368 4.496 4 44.6 40.0 54.6 3.274 3.694 4.072 5 50.3 42.7 48.3 48.4 3.132 3.574 3.068 3.238 Regrouping number 1 Group 2 is incorporated into group 1 Displacement = 1421.829 Distance = 11.305 ``` ``` Group number 1 3 4 5 Total cases 1000 Proportion of cases 298 195 166 339 Explained Grand variance mean 1 765 ******** 42.02 v262:teaching 24.92 68.32 31.83 46.88 8.64 9.63 10.90 6.80 9 69 * 34.2 v204:rank CODE=0001 50.2 14.2 33.9 31.9 100.0 43.7 8.1 16.4 31.6 10 1 34.5 v204:rank CODE=0002 33.8 33.3 32.8 36.7 100.0 29.1 18.8 15.8 36.1 11 90 * 29.5 v204:rank CODE=0003 12.9 52.5 32.2 29.7 100.0 13.0 34.6 18.1 34.1 ``` Group 1 ``` Var EV Mean -2.5 -2.0 -1.5 -1.0 -0.5 0 0.5 1.0 1.5 2.0 2.5 seq I 1 767 25.631 X-----------------I v262:teaching I 2 464 22.092 XI v263:research I 3 519 30.876 I------------------------X v264:supervsn I 4 83 5.498 XI v265:lab-dev I 5 564 6.498 X--I v266:admin I 6 23 4.498 I----X v267:extension I 7 96 4.908 I--------X v268:profess I 8 164 3.240 I------------X v344:#doc students I 9 70 48.387 I-----X v204:rank CODE=0001 I 10 3 36.406 IX v204:rank CODE=0002 I 11 90 11.982 X-------I v204:rank CODE=0003 I 12 75 8.756 XI v217:head? CODE=0001 I 13 76 90.783 IX v217:head? CODE=0002 I 14 63 21.659 X-----I sv:inst type CODE=0001 I 15 59 32.719 I-X sv:inst type CODE=0002 I 16 110 19.355 I-----------X sv:inst type CODE=0003 I 17 9 26.267 X-I sv:inst type CODE=0004 I 18 1 61.751 IX :field CODE=0001 I 19 1 37.788 XI :field CODE=0002 I ``` ```......... : : : : ..... : : : : T 1 T 2 T 4 T 0 208 90 166 ```
##### INTERPRETATION
 IDAMS reports analysis specification ``` No. of cases read = 1067 No. of cases analysed = 1063 Active quantitative variables = V2-V8 Passive quantitative variables = V13 Passive qualitative variables = V9, V10, V14, V15 (These variables are defined in the file TYPE.DIC) No. of initial groups = 5 No. of final groups = 3 Distance metric used = Euclidean Regrouping based on minimum distance. ``` History of % of variation explained from the first iteration to the final (i.e. the third iteration) Characteristics of distances by groups N = The number of cases of each group of the initial typology Mean = Mean of the distances from the group profile over all cases in the group SD = Standard deviation of the distance for each group Total count = Total number of cases participating in the building of the initial typology SD = Overall standard deviation of distance Mean, S.D. and Weight of quantitative variables For active quantitative variables, weight = 1 For passive quantitative variables weight = 0 (Note: Passive variables do not participate in the construction of typology). Description of typology For each variable, the following information is given: -Serial number -Variance explained (in permils) = i.e. 1/1000) -Sequence of stars ***, the number of stars is proportional to the variance explained. This provides a visualization of the importance of a variable in explaining the differences between typology groups. -Grand mean = Mean value of the variable overall cases. For each typology group: Proportion of cases (per thousand) Quantitative variables Row 1: Mean value of the variable Row 2: Standard deviation of the variable Qualitative variables For each quantitative variable Variance explained by each category of the variable Percentage of cases in each category of the variable For each group: Row 1: Percentage frequency of a given category in the group. This value summed over all categories = 100 Row 2: Distribution of a given category over all the groups. This value summed over all groups = 100 For example: Quantitative Variable V262: Teaching ``` Gr1 Gr2 Gr3 Gr4 Gr5 Row 1: 25.63 23.28 60.32 31.83 46.88 Row 2: 8.62 9.55 9.63 10.90 6.80 ``` The first row shows the average time spent by the members of typology group. The second row shows the standard deviation of this variable for each typology group. For example: Qualitative Variable V262: Rank This variable has 3 categories: ```Col.1 Col.2 Col.3 Col.4 Col.5 Col.6 Col.7 Code Gr1 Gr2 Gr3 Gr4 Gr5 34.2 1 48.4 54.3 14.2 33.9 31.9 100.0 29.4 14.3 8.1 16.4 31.6 34.5 2 36.4 27.7 33.3 32.8 36.7 100.0 21.9 7.2 18.8 15.8 36.1 29.5 3 12.0 14.9 52.5 32.2 29.7 100.0 8.4 4.5 34.6 18.1 34.1 ``` Col. 1 show the distribution of different categories in the entire sample: 34.2% Category 1 (Professor) 34.6% Category 2 (Reader) 29.5% Category 3 (Lecture) Let us consider Group 1 The composition of Group is Category 1: Professors : 50.6% Category 2: Reader : 34.2% Category 3: Lecture : 12.8 Category 1 is more abundant in Group 1 Category 3 is more deficient in Group 1. Category 2 has about the same frequency as in the entire sample. Similar interpretation for other categories. Variables explaining 80% of the variance. This is a list of the most discriminant variables, which taken together account for 80% of the explained variance. These variables are ranked according to their explanatory power. Mean variance explained by active variables = Mean amount of variance explained by the active variables. Mean variance explained by all the variables taken together. Mean amount of variance explained by the most discriminant variables Proportion of variables = 100 ´ No. of most discriminant variables/All variables. Matrix of (square roots of) Inter-group distances and displacements First row = Square root of distances Second row = Square root of displacement It can be easily seen that the distance between Group 1 and Group 2 is minimum. Hence, these two groups are merged. Description of the resulting 3 - group typology Similar interpretation as that of the 5 - group typology. Graphical representation of profiles of different groups Var seq = Sequence number of variables as in the description of the typology EV = Explained variance Mean = Mean value of the group. Vertical line correspondence to the grand mean for all variables. Horizontal bars show the deviation of the mean value of the variable of a given typology group from the grand mean. Horizontal bars to the right of the vertical line Þ Value greater than the grand mean. The length of the bar is proportional to the deviation from the grand mean (calibrated in terms of standard deviation). Horizontal bars to the left of the vertical line Þ Value less than the grand mean. The length of the bar is proportional to the deviation from the grand mean (calibrated in terms of standard deviation). Dendrogram showing the mergers of groups. The dendrogram can help in deciding the number of typology groupd retained for interpretation.. Another factor in deciding the number of typology groups is based on the interpretation of typology from theoretical point of view.