7(1) Example of Cluster Analysis by Partitioning Around Medoids

Research Question

:

Classification of 35 major countries according to the pattern of their collaboration with India in different fields of science.

Methodology

:

Cluster Analysis using the algorithm PAM (Partitioning Around Medoids)

Dataset

:

COOP.DAT
SYNTAX*
$RUN CLUSFIND
$FILES
PRINT = PAM.LST
DICTIN = COOP.DIC
DATAIN = COOP.DAT
$SETUP
CLUSTER ANALYSIS USIN PAM
BADDATA=MD1 -
 IDVAR=V1 -
 VARS=(V2-V12) -
 ANALYSIS=PAM -
 CMIN=4 -
 PRINT=(DICT,DISS,GRAPH,TRACE)
 

-----------
Note: All options set at default values

EXTRACT FROM COMPUTER OUTPUT

After filtering       35 cases read from the input data file
MIN clusters= 4
  Number of variables: 11
   Number of objects: 35

 

*** Dissimilarity matrix ***

 
1
2
3
4
5
6
7
8
9
10
11
12
13
1
.00
                       
2
24.03
.00
                     
3
25.63
4.58
.00
                   
4
26.31
8.56
6.14
.00
                 
5
29.48
8.44
5.50
5.86
.00
               
6
28.95
7.60
4.56
5.77
3.01
.00
             
7
31.17
9.86
6.81
6.38
2.90
4.10
.00
           
8
32.56
10.94
7.97
7.58
3.65
4.48
2.49
.00
         
9
31.58
9.58
6.89
6.87
4.01
3.98
3.49
3.11
.00
       
10
32.56
10.89
8.02
7.31
4.16
4.87
2.21
1.92
2.80
.00
     
11
31.79
10.68
7.80
6.14
3.83
4.52
2.45
2.27
2.76
1.73
.00
   
12
32.65
10.82
8.16
7.66
4.16
4.68
2.94
1.88
2.82
1.26
2.05
.00
 
13
33.08
11.32
8.57
7.94
4.03
5.06
2.52
1.82
3.26
1.48
2.29
1.55
.00
14
33.19
11.39
8.59
7.90
4.30
5.11
2.74
1.71
3.05
1.15
2.03
1.20
.81
15
32.87
11.24
8.40
7.54
4.14
4.64
2.90
1.60
2.88
1.51
1.80
1.18
1.26
16
33.10
11.51
8.63
7.75
4.28
5.02
2.82
1.69
3.06
1.26
1.90
1.21
.96
17
33.30
11.60
8.77
7.91
4.51
5.20
3.01
1.63
3.13
1.28
1.98
1.17
1.14
18
33.00
11.44
8.62
7.52
4.62
5.15
3.00
2.21
2.92
1.10
1.75
1.34
1.49
19
33.06
11.60
8.77
7.81
4.42
5.04
3.28
2.26
3.19
1.79
2.19
1.36
1.47
20
33.41
11.69
8.90
8.13
4.67
5.22
3.19
2.03
3.15
1.47
2.25
1.19
1.11
21
33.69
11.96
9.13
8.32
4.79
5.56
3.25
2.05
3.37
1.55
2.35
1.47
1.07
22
33.30
11.67
8.86
7.92
4.60
5.21
3.18
2.13
3.07
1.48
2.03
1.25
1.18
23
33.49
11.76
8.99
8.23
4.55
5.33
3.20
1.98
3.17
1.67
2.37
1.46
.88
24
33.47
11.69
8.92
8.13
4.73
5.40
3.36
2.23
2.94
1.58
2.17
1.38
1.33
25
33.16
11.17
8.57
8.07
4.88
5.44
3.99
3.09
2.05
2.52
2.73
2.36
2.56
26
33.66
11.96
9.16
8.27
4.80
5.53
3.31
2.01
3.33
1.60
2.27
1.42
1.13
27
33.49
11.79
9.00
8.14
4.70
5.40
3.28
2.09
3.19
1.52
2.21
1.32
1.11
28
33.55
11.83
9.06
8.19
4.79
5.44
3.39
2.03
3.25
1.55
2.32
1.26
1.28
29
33.47
11.85
9.07
8.08
4.69
5.47
3.31
2.19
3.25
1.58
2.17
1.38
1.16
30
33.53
11.92
9.11
8.18
4.75
5.49
3.38
2.14
3.35
1.62
2.26
1.34
1.23
31
33.51
11.74
8.98
8.17
4.74
5.35
3.40
2.04
3.13
1.58
2.26
1.22
1.29
32
33.48
11.77
9.05
8.17
4.75
5.49
3.42
2.27
3.08
1.61
2.24
1.27
1.31
33
33.65
12.02
9.22
8.27
4.82
5.60
3.41
2.19
3.42
1.71
2.32
1.46
1.22
34
33.69
12.01
9.19
8.28
4.90
5.64
3.39
2.20
3.39
1.55
2.39
1.51
1.27
35
33.81
12.09
9.30
8.41
4.92
5.70
3.41
2.16
3.44
1.69
2.42
1.56
1.18

 
14
15
16
17
18
19
20
21
22
23
24
25
26
14
.00
                       
15
.85
.00
                     
16
.53
.60
.00
                   
17
.54
.73
.49
.00
                 
18
1.09
1.16
.94
.92
.00
               
19
1.22
.97
.85
1.19
1.37
.00
             
20
.86
.93
.67
.75
.96
.93
.00
           
21
.65
1.07
.64
.63
1.14
1.11
.63
.00
         
22
.78
.79
.53
.73
.91
.70
.49
.60
.00
       
23
.87
1.08
.76
.87
1.20
1.09
.55
.63
.71
.00
     
24
.91
1.15
.92
.87
1.02
1.23
.77
.70
.65
.80
.00
   
25
2.23
2.33
2.32
2.31
2.23
2.40
2.27
2.26
2.10
2.19
1.69
.00
 
26
.75
1.01
.66
.57
1.08
1.10
.53
.28
.56
.55
.65
2.25
.00
27
.75
.99
.67
.62
.90
1.08
.50
.45
.50
.54
.47
2.10
.35
28
.83
.98
.70
.54
.93
1.07
.58
.57
.64
.70
.77
2.26
.47
29
.81
1.01
.64
.73
.96
.87
.61
.49
.40
.60
.60
2.13
.43
30
.82
1.00
.63
.67
1.05
.86
.60
.44
.47
.68
.67
2.24
.39
31
.80
.90
.71
.56
.95
1.04
.57
.57
.56
.69
.60
2.08
.46
32
.95
1.16
.86
.90
1.08
.94
.67
.67
.54
.71
.51
1.89
.60
33
.84
1.07
.71
.75
1.19
.91
.67
.37
.50
.69
.71
2.25
.36
34
.83
1.18
.75
.69
.96
1.16
.73
.44
.70
.76
.79
2.30
.49
35
.80
1.16
.78
.72
1.19
1.16
.68
.20
.65
.64
.71
2.26
.24

 
27
28
29
30
31
32
33
34
35
27
.00
               
28
.43
.00
             
29
.34
.50
.00
           
30
.38
.43
.24
.00
         
31
.36
.25
.48
.43
.00
       
32
.53
.65
.41
.50
.57
.00
     
33
.48
.58
.31
.23
.55
.51
.00
   
34
.47
.39
.47
.44
.51
.71
.52
.00
 
35
.46
.55
.46
.42
.56
.63
.31
.43
.00
 

Number of representative objects: 4

Average distance, initial build : 36.916
Final average distance : 33.190

Cluster Medoid Size Objects

Cluster Medoid Size Objects
1
1
1
1
                   
2
3
2
2
3
                 
3
5
5
4
5
6
7
8
           
4
22
27
9
10
11
12
13
14
15
16
17
18
 
     
19
20
21
22
23
24
25
26
27
28
 
     
29
30
31
32
33
34
35
       

Coordinates of medoids

 1   97.00 832.00 290.00 120.00 132.00  58.00 305.00 313.00 188.00  49.00  78.00
 3   11.00 387.00  93.00  49.00  39.00  23.00  79.00  94.00  52.00   4.00  16.00
 5    4.00 224.00  47.00  15.00  26.00   7.00  21.00  40.00  13.00   6.00  20.00
22    2.00  17.00   7.00   5.00    .00   1.00   7.00   4.00   2.00   1.00    .00
 

Clustering vector

1   2   2   3   3   3   3   3   4   4   4   4   4   4   4   4   4   4   4     4   4
4   4   4   4   4   4   4   4   4   4   4   4   4   4
 

Clustering characteristics

Cluster 1 is an isolated singleton object 1 , separation 618.50

Cluster 2 an isolated L*-Cluster with diameter 119.02 and separation 188.28

Number of isolated clusters: 2

Diameterofeachcluster
.00
119.02
144.86
112.66
Diameterofeachcluster
.00
119.02
144.86
112.66
Separationofeachcluster
618.50
188.28
39.08
39.08
Averagedistancetoeachmedoid
.00
59.51
67.14
26.00
Maximumdistancetoeachmedoid
.00
119.02
106.94
93.75
 
                                   Silhouettes
                       0   .1   .2   .3   .4   .5   .6   .7   .8   .9    1
  CLU  NEIG  S(I)   I  +----+----+----+----+----+----+----+----+----+----+
    1    2   .00      1|                                                 |
                       |                                                 |
    2    3   .53      2|**************************                       |
    2    3   .49      3|************************                         |
                       |                                                 |
    3    4   .59      5|*****************************                    |
    3    4   .50      4|*************************                        |
    3    2   .49      7|************************                         |
    3    4   .27      6|*************                                    |
    3    4  -.04      8|                                                 |
                       |                                                 |
    4    3   .85     22|******************************************       |
    4    3   .85     30|******************************************       |
    4    3   .85     26|******************************************       |
    4    3   .85     33|******************************************       |
    4    3   .85     27|******************************************       |
    4    3   .85     29|******************************************       |
    4    3   .85     34|******************************************       |
    4    3   .85     24|******************************************       |
    4    3   .84     35|******************************************       |
    4    3   .84     23|******************************************       |
    4    3   .84     20|*****************************************        |
    4    3   .83     28|*****************************************        |
    4    3   .83     32|*****************************************        |
    4    3   .83     19|*****************************************        |
    4    3   .83     18|*****************************************        |
    4    3   .83     31|*****************************************        |
    4    3   .81     21|****************************************         |
    4    3   .80     15|****************************************         |
    4    3   .80     25|****************************************         |
    4    3   .78     17|***************************************          |
    4    3   .76     16|*************************************            |
    4    3   .72     12|************************************             |
    4    3   .62      9|******************************                   |
    4    3   .60     14|******************************                   |
    4    3   .56     13|****************************                     |
    4    3   .50     11|*************************                        |
    4    3   .20     10|**********                                       |
                       +----+----+----+----+----+----+----+----+----+----+
                       0   .1   .2   .3   .4   .5   .6   .7   .8   .9    1


        Cluster 1 has average silhouette width     1.00
        Cluster 2 has average silhouette width      .51
        Cluster 3 has average silhouette width      .36
        Cluster 4 has average silhouette width      .76
 
 For the entire data set, the average silhouette width is .70
 which indicates a strong structure was found.
INTERPRETATION

IDAMS reports analysis specification:

No. of Objects = 35
No. of Variables = 11
No. of Clusters = 4

  Dissimilarity matrix is a matrix of unstandardized Euclidean distances among the objects.
No. of medoids (representative objects) = Number of clusters = 4
 

Average dissimilarity for the solution found in the first part of the algorithm (BUILD) = 36.916
Average dissimilarity for the solution found in the second part of the algorithm (SWAP) = 33.190
These two values are defined as the average dissimilarity between each object and its most similar object (i.e., medoid)..

Following information for each cluster:

    • Its medoid (i.e. the most representative object)
    • Cluster size (i.e. the number of objects in the cluster)
    • Members of the cluster

Cluster 1: Only one member: Object identified by Idcode (1), which is naturally its medoid.
Cluster 2: Only two objects: Objects (2) and (3). Object (3) is the medoid of this cluster.
Cluster 3: 5 objects identified by Idcode (4), (5), (6), (7), (8) Object (5) is the medoid of this cluster.
Cluster 4: 27 objects

Idcode: (9) to (35)
Its medoid is object (22)

 

Clustering Vector: The jth element of this vector is the number of the cluster to which the object j belongs. Cluster numbers are ordered from left to right. Cluster 1 is encountered first, followed by Cluster 2, and so on.

Clustering Vector is

1 22 33333 444444444444444444444444444

Thus, object (1) belongs to Cluster 1
Objects (2) and (3) belong to Cluster 2
Objects (3), (4), (5), (6), (7) belongs to Cluster 3
The remaining objects belong to Cluster 4.

 

Clustering Characteristics

Cluster 1 is an isolate singleton object.
Separation = 618.50

Cluster 2 is an isolate Lx cluster with Diameter = 113.02 and Separation =108.20

Diameters and separations of the cluster

 

Silhouette Plot

The following information is given for each object.

  • CLU : Cluster number to which it belongs
  • NEIG: The number of neighboring cluster.

This is the second best cluster for an object. For example, the second best choice for object (3) in Cluster 2 is Cluster 3. The second best choice for object 7 in Cluster 3 is Cluster 2. The second best choice for object (5) in Cluster 2 is Cluster 4.

  • S(I): The value of silhouette width.

Silhouette width of objects

Value close to 1 implies that the object is well classified. Value close to 0 implies that the object is arbitrarily assigned to the cluster. Values close to –1 indicate that the object is poorly classified.

  • Idcode of Cluster
  • A line, whose length is proportioned to S(I) if the value is positive.
  • The proportion of blackness of the silhouette plot is an indicator of the quality of clustering.
  • Average value of silhouette width for each cluster. Higher the value, tighter is the cluster.

It can be easily seen from the silhouette plot that objects (1) and (8) have silhouette width close to zero, which implies that these objects have been arbitrarily assigned to their clusters.

Leaving aside isolated cluster, Cluster (4) is tighter than other clusters.

Silhouette Coefficient ( i.e,. Average silhouette width) = 0.70, which indicates reasonable structure.

7(2) Example of Fuzzy Cluster Analysis

Research Question

:

Classification of 35 major countries according to the pattern of their collaboration with India in different fields of science.

Methodology

:

Cluster Analysis using the algorithm FANNY (Fuzzy Analysis)

Dataset

:

COOP.DAT
SYNTAX
$RUN CLUSFIND
$FILES
PRINT = FANNY.LST
DICTIN = COOP.DIC
DATAIN = COOP.DAT
$SETUP
CLUSTER ANALYSIS USIN FANNY
BADDATA=MD1 -
 IDVAR=V1 -
 VARS=(V2-V12) -
 ANALYSIS=FANNY-
 CMIN=4 -

 PRINT=(DICT,DISS,GRAPH,TRACE)

After filtering      35 cases read from the input data file

Number of variables: 11

Number of objects: 35

Number of clusters: 4

 

Dissimilarity matrix ***

  1 2 3 4 5 6 7 8 9 10 11 12 13
1 .00                        
2 622.58 .00                      
3 618.50 119.02 .00                    
4 820.37 231.17 222.52 .00                  
5 808.24 223.47 195.28 73.98 .00                
6 857.14 263.25 265.64 89.49 106.94 .00              
7 802.80 224.00 188.28 96.90 47.83 144.86 .00            
8 910.02 311.13 299.44 106.02 106.93 94.87 127.87 .00          
9 959.66 355.54 360.37 153.35 177.16 109.10 202.75 85.49 .00        
10 924.78 320.30 317.46 114.90 130.26 101.26 147.72 39.08 66.00 .00      
11 945.90 342.23 341.09 131.88 152.95 109.59 174.29 55.20 47.46 33.08 .00    
12 974.73 366.41 376.05 169.22 192.46 129.64 215.55 97.19 39.89 70.55 49.63 .00  
13 955.59 351.11 348.34 144.95 157.64 116.88 177.37 57.11 48.00 36.41 24.82 50.01 .00
14 960.30 354.77 352.74 147.55 162.80 122.87 181.92 61.20 49.72 38.70 22.05 46.98 13.34
15 985.67 380.33 382.65 173.68 194.36 136.11 217.12 93.43 38.50 72.83 46.71 29.85 42.61
16 978.47 373.97 373.24 165.46 183.63 132.54 204.67 81.59 40.45 60.51 37.07 35.76 29.02
17 984.63 379.10 379.89 171.30 190.61 139.47 211.55 87.98 45.41 66.41 41.42 32.82 36.69
18 1006.82 401.92 407.05 195.60 220.62 158.12 242.72 120.72 56.09 96.14 71.11 39.59 68.10
19 1009.79 404.27 409.26 200.11 222.06 157.74 244.57 121.90 55.05 99.41 75.23 42.74 68.76
20 1009.08 403.77 408.87 200.13 221.68 157.47 244.29 121.55 55.60 98.92 74.56 40.89 68.42
21 993.56 388.21 388.05 180.46 198.10 147.29 218.28 95.35 50.84 74.02 50.43 38.86 41.96
22 1007.23 401.43 405.42 195.98 217.49 156.70 239.36 116.52 54.41 93.75 68.72 39.48 62.88
23 1010.80 406.04 408.88 200.35 220.02 159.27 242.26 118.65 56.81 97.86 73.44 45.63 65.57
24 1011.74 406.47 409.96 200.81 221.91 162.01 244.26 121.07 61.07 98.87 72.72 44.15 68.21
25 1022.55 416.45 421.85 212.17 234.79 171.94 257.73 134.61 67.78 112.66 86.24 55.61 82.53
26 1007.02 401.49 403.62 194.75 214.32 157.78 235.56 111.85 56.70 90.52 65.43 41.45 58.75
27 1013.40 408.26 411.85 202.26 223.64 163.26 245.88 122.43 61.83 100.28 74.59 45.14 69.40
28 1019.60 413.62 418.72 208.28 230.92 169.35 252.88 129.16 66.26 106.59 81.54 49.07 76.44
1                          
  1 2 3 4 5 6 7 8 9 10 11 12 13
29 1016.09 410.45 414.14 204.20 225.65 166.14 247.23 123.92 63.51 101.49 76.30 47.21 70.44
30 1013.44 407.85 411.22 201.63 222.70 163.76 244.39 120.90 62.08 98.67 73.38 44.84 67.76
31 1019.85 413.82 419.16 208.91 231.53 169.26 253.89 130.12 66.69 107.74 82.08 49.21 77.48
32 1019.45 412.86 418.66 209.08 230.98 169.23 252.75 129.73 65.89 106.51 81.67 47.82 76.08
33 1010.79 404.82 407.32 198.30 218.11 161.75 239.06 115.71 60.37 93.85 68.94 43.91 62.25
34 1011.70 406.35 408.36 198.43 219.37 163.29 240.10 116.92 61.25 94.55 70.28 46.09 63.84
35 1007.36 401.59 402.89 194.26 213.12 159.21 233.54 110.20 58.56 88.82 64.46 43.75 56.88
                           
  14 15 16 17 18 19 20 21 22 23 24 25 26
14 .00                        
15 38.83 .00                      
16 25.61 16.12 .00                    
17 30.63 14.00 10.95 .00                  
18 64.14 30.53 40.91 35.41 .00                
19 66.35 30.36 42.02 38.60 16.37 .00              
20 66.26 30.81 42.08 38.17 14.80 7.28 .00            
21 37.38 17.35 15.91 11.87 32.20 33.15 32.95 .00          
22 59.41 24.66 35.58 30.79 13.56 10.39 10.54 24.84 .00        
23 63.66 29.58 39.03 35.51 18.95 11.70 10.49 28.53 11.00 .00      
24 63.95 30.85 41.11 34.87 16.25 18.60 16.28 28.86 11.75 15.97 .00    
25 78.03 44.23 55.70 49.76 26.31 25.57 25.79 44.08 24.70 26.25 19.29 .00  
26 54.82 23.26 31.45 25.59 19.31 19.00 18.00 18.28 10.15 13.27 13.23 28.83 .00
27 65.52 31.58 41.94 35.93 13.82 15.52 13.34 29.83 10.25 12.73 5.74 20.42 12.65
28 72.23 37.80 48.75 42.30 15.39 16.28 15.87 36.91 15.91 16.97 15.20 19.97 19.60
29 66.44 33.14 42.97 36.99 14.83 15.03 14.46 30.22 10.49 12.61 9.49 20.25 12.77
30 63.47 30.40 40.17 33.82 14.83 16.06 15.07 27.37 9.59 13.30 7.75 22.14 10.15
31 73.12 38.14 49.79 43.19 16.52 16.94 16.31 37.87 16.22 18.11 13.45 17.12 20.40
32 72.32 38.50 49.27 43.17 17.15 15.43 14.04 37.05 14.97 16.40 13.86 17.78 19.72
33 57.85 26.89 35.11 28.90 19.31 18.95 18.65 21.17 10.34 14.49 11.79 26.40 5.10
34 59.20 29.03 36.21 30.12 17.06 20.12 20.30 23.02 13.08 16.00 14.59 27.48 9.59
35 52.43 24.31 30.08 24.19 23.30 23.35 23.19 15.36 14.59 17.66 17.23 31.73 6.00
                           
  27 28 29 30 31 32 33 34 35        
27 .00                        
28 10.77 .00                      
29 5.39 8.66 .00                    
30 4.58 10.91 4.24 .00                  
31 10.10 4.47 9.33 11.18 .00                
32 11.09 7.94 9.17 11.83 7.55 .00              
33 11.14 17.26 9.95 7.55 17.89 17.18 .00            
34 12.33 16.06 10.34 8.89 17.94 18.47 7.75 .00          
35 17.09 23.19 16.16 13.82 24.25 23.47 6.78 9.90 .00        
 

Iteration objective function

     1              595.4648
     2              581.4908
     3              577.1000
     4              572.3036
     5              565.7405
     6              558.7579
     7              553.5726
     8              549.4211
     9              545.3386
    10              541.7881
    11              539.6334
    12              538.6904
    13              538.3286
    14              538.1903
    15              538.1367
    16              538.1155
    17              538.1072
    18              538.1038
    19              538.1031
    20              538.1028
 

*** Fuzzy Clustering ***

                     1           2           3           4
      1           .3930       .2392       .1905       .1772
      2           .6938       .1501       .0847       .0714
      3           .6793       .1664       .0841       .0702
      4           .1127       .6041       .1677       .1155
      5           .0777       .7629       .0936       .0658
      6           .1212       .4176       .2766       .1846
      7           .1209       .6621       .1257       .0914
      8           .0800       .3123       .4101       .1976
      9           .0453       .1162       .5478       .2908
     10           .0618       .2076       .5293       .2012
     11           .0376       .1108       .6678       .1838
     12           .0373       .0895       .5178       .3554
     13           .0315       .0899       .7055       .1731
     14           .0281       .0784       .7260       .1674
     15           .0244       .0592       .5304       .3861
     16           .0218       .0555       .6849       .2378
     17           .0232       .0572       .6109       .3088
     18           .0189       .0417       .1973       .7422
     19           .0182       .0400       .1837       .7581
     20           .0170       .0374       .1730       .7726
     21           .0245       .0588       .4900       .4266
     22           .0114       .0255       .1343       .8287
     23           .0159       .0353       .1690       .7798
     24           .0133       .0294       .1384       .8189
     25           .0291       .0614       .2286       .6809
     26           .0140       .0316       .1792       .7753
     27           .0098       .0214       .0992       .8696
     28           .0159       .0340       .1406       .8096
     29           .0096       .0209       .0947       .8748
     30           .0090       .0197       .0942       .8771
     31           .0163       .0348       .1423       .8066
     32           .0161       .0344       .1424       .8070
     33           .0128       .0285       .1496       .8091
     34           .0149       .0331       .1677       .7843
     35           .0184       .0418       .2410       .6987
 

Partition coefficient of Dunn = .55

Its normalized version = .40

 

Closest hard clustering

 Cluster    Size    Objects
     1        3        1    2    3
     2        4        4    5    6    7
     3       11        8    9   10   11   12   13   14   15   16   17   21
     4       17       18   19   20   22   23   24   25   26   27   28   29   30
                      31   32   33   34   35
 

Clustering vector

1   1   1   2   2   2   2   3   3   3   3   3   3   3   3   3   3   4   4   4   3   4
4   4   4   4   4   4   4   4   4   4   4   4   4   
 

Silhouettes

                       0   .1   .2   .3   .4   .5   .6   .7   .8   .9    1
  CLU  NEIG  S(I)   I  +----+----+----+----+----+----+----+----+----+----+
    1    2   .25      1|************                                     |
    1    2  -.36      2|                                                 |
    1    2  -.41      3|                                                 |
                       |                                                 |
    2    3   .55      5|***************************                      |
    2    3   .49      7|************************                         |
    2    3   .42      4|*********************                            |
    2    3   .07      6|***                                              |
                       |                                                 |
    3    4   .45     11|**********************                           |
    3    4   .44     13|**********************                           |
    3    4   .44     10|*********************                            |
    3    4   .44     14|*********************                            |
    3    2   .31      8|***************                                  |
    3    4   .16      9|*******                                          |
    3    4   .15     16|*******                                          |
    3    4  -.06     17|                                                 |
    3    4  -.08     12|                                                 |
    3    4  -.23     15|                                                 |
    3    4  -.32     21|                                                 |
                       |                                                 |
    4    3   .82     29|*****************************************        |
    4    3   .81     30|****************************************         |
    4    3   .81     27|****************************************         |
    4    3   .78     28|***************************************          |
    4    3   .78     31|***************************************          |
    4    3   .78     32|***************************************          |
    4    3   .77     24|**************************************           |
    4    3   .77     22|**************************************           |
    4    3   .75     33|*************************************            |
    4    3   .74     23|*************************************            |
    4    3   .74     20|*************************************            |
    4    3   .74     34|*************************************            |
    4    3   .73     19|************************************             |
    4    3   .72     26|***********************************              |
    4    3   .71     18|***********************************              |
    4    3   .68     25|*********************************                |
    4    3   .65     35|********************************                 |
                       +----+----+----+----+----+----+----+----+----+----+
                       0   .1   .2   .3   .4   .5   .6   .7   .8   .9    1


        Cluster 1 has average silhouette width     -.18
        Cluster 2 has average silhouette width      .38
        Cluster 3 has average silhouette width      .15
        Cluster 4 has average silhouette width      .75
 
 For the entire data set, the average silhouette width is .44
 which indicates a strong structure was found.
INTERPRETATION

IDAMS reports data specifications

No. of Objects : 35

No. of Variables : 11

Number of Clusters: 4

Note that Fanny does not use any representative objects (medoids). Instead, the algorithm attempts to minimize the objective function defined earlier. The objective function is really a kind of total dispersion. The algorithm needed 30 iterations for convergence.

  Dissimilarity matrix: Matrix of Euclidean distances between the objects
 

Objective Function
  Ist iteration     : 595.4648
  20th iteration  : 538.1028

 

Membership coefficient of different objects.
           .
Some objects have very large value for one cluster and small values for the other clusters.
For example, object (22) belongs to Cluster 4 with membership coefficient = 0.8287, whereas object (8) belongs to Cluster 3 with membership function = 0.4101.

 

Dunn’s partition coefficient        = 0.55
Normalized Dunn’s coefficient   = 0.40

Note: Normalized Dunn’s Coefficient value:

0 Þ Completely fuzzy clustering
1 Þ Completely hard clustering

 

Closet hard cluster:
Size and Idcodes of objects for each cluster

 

Clustering vector

The first three objects belong to Cluster 1.
The next four objects belongs to Cluster 2.
The next 11 objects belong to Cluster 3.
The next 17 objects belong to Cluster 4.

 

Silhouette plot of the closest hard clustering.

Objects (2) and (3) of Cluster 1 and objects (15) and (21) of Cluster 3 have negative silhouette width. These objects are poorly classified. Objects (17) and (12) of Cluster 3 have zero silhouette width. Thes objects are arbitrarily assigned to their cluster. These objects could also have been assigned to their neighbor (viz. Cluster 4). The silhouette widths of objects in Cluster 4 are larger than those of objects of other clusters, which implies that they are better classified than their counterparts in other clusters..

Average silhouette width of Cluster 1 is negative (–0.18), which implies that this cluster is poorly constituted..

Average silhouette width of Cluster 3 is small (+0.15) , which implies arbitrariness of the cluster.

Average silhouette width of Cluster 4 is quite large (+0.75) , which implies that this cluster is well constructed.

Silhouette coefficient of the entire structure =0.44, which implies that clustering structure is rather weak.

7(3) Example of Cluster Analysis of Large Data Sets

Research Question

:

Classification of a sample of research units in India according to the pattern of time spent on R & D inside the unit, administration., teaching and consultancy.

Methodology

:

Cluster Analysis using the algorithm CLARA.

Dataset

:

ICSOPRU2.DAT
SYNTAX*
$RUN CLUSFIND
$FILES
PRINT = CLARA.LST
DICTIN = ICSOPRU2.DIC
DATAIN = ICSOPRU2.DAT
$SETUP
INCLUDE V1=360
Cluster Analysis using CLARA
BADDATA=MD1 -
 IDVAR=V2 -
 VARS=(V22,V24,V25,V26) -
 ANALYSIS=CLARA -
 CMIN=3 -
 PRINT=(DICT,GRAPH,TRACE)
-------------------------

Note: All options set at default values

EXTRACT FROM COMPUTER OUTPUT

Number of clusters: 3

Number of variables: 4

Number of objects: 100

Number of representative objects: 3

 
Drawing 5 samples of 46 objects.


Sample number    1
 
Objects selected:
108  117  126  136  204  208  209  210  250  303  309  310  413  419
425  430  501  504  510  604  608  704  711  715  750  805  809  810
813  904  908  909  910  916 1203 1206 1207 1214 1219 1224 1227 1301
1302 1305 1311 1401

 Average distance, initial build =    7.519
 Average distance for this sample =   7.515


Results for the entire data set
Total distance   =        830.364
Average distance =          8.304
Cluster   Size  Medoid    Coordinates of Medoids
 
     1      44     17        45.00       5.29       4.00       6.71
     2      35     70        64.00       6.67       1.67       2.67
     3      21     21        78.00       6.00        .00       2.00
 
Average distance to each medoid
           1           2           3
        9.865       6.268       8.424
 
Maximum distance to each medoid
           1           2           3
       28.034      11.356      19.315
1
 
Maximum distance to a medoid divided by minimum distance to another medoid
 
           1           2           3
        1.429        .804       1.367
 
Sample number    2
 
Objects selected:
101  108  117  126  136  140  204  209  210  250  303  306  309  425
430  501  502  503  603  604  608  609  701  704  708  712  750  801
803  805  808  809  810  813  904  910 1203 1207 1208 1219 1224 1225
1227 1301 1307 1311
 
Average distance, initial build =        6.939
Average distance for this sample =       6.939
 
Results for the entire data set
Total distance   =        826.122
Average distance =          8.261
 
Cluster Size  Medoid    Coordinates of Medoids
     1      45     33        46.67       8.33        .00       6.67
     2      34     70        64.00       6.67       1.67       2.67
     3      21     85        77.50        .00        .83       4.17
 
Average distance to each medoid
           1           2           3
        9.327       6.138       9.413
 
Maximum distance to each medoid
           1           2           3
       29.104       9.775      19.226

Maximum distance to a medoid divided by minimum distance to another medoid
           1           2           3
        1.622        .645       1.269
 
Sample number    3
  
Objects selected:
108  117  120  136  201  205  210  250  303  304  307  309  310  320
501  502  503  504  505  506  601  604  606  610  702  703  706  711
715  750  803  809  813  904  907  908  909  916 1206 1208 1219 1224
1227 1301 1305 1314
 
Average distance, initial build =        8.357
Average distance for this sample =       7.916

Results for the entire data set
 
Total distance   =        820.751
Average distance =          8.208
 
Cluster  Size  Medoid    Coordinates of Medoids

     1      44     33        46.67       8.33        .00       6.67
     2      37     83        62.86       6.43       3.57       4.29
     3      19     14        83.14       2.14        .86       2.00
 
Average distance to each medoid
          1           2           3
       9.306       6.427       9.132
 
Maximum distance to each medoid
          1           2           3
      29.104      10.805      16.742
 
Maximum distance to a medoid divided by minimum distance to another medoid
           1           2           3
        1.727        .641        .796
 
Sample number    4
  
Objects selected:
101  109  204  210  250  303  306  307  309  320  351  413  430  502
503  504  505  508  510  603  604  605  607  701  703  706  801  803
805  907  908  909  911  913  916 1203 1206 1207 1219 1224 1225 1308
1311 1312 1314 1401
 
Average distance, initial build =        7.831
Average distance for this sample =       7.714
 
 
Results for the entire data set
 
Total distance   =        867.108
Average distance =          8.671
 
Cluster  Size Medoid    Coordinates of Medoids
 
     1      52     20        49.17       6.67       1.17       8.33
     2      41     84        67.50       3.33        .83       2.50
     3       7     98        92.00       2.00        .00        .00
 
Average distance to each medoid
            1           2           3
         9.882       7.621       5.820
 
Maximum distance to each medoid
            1           2           3
       30.031      15.065      11.051
 
Maximum distance to a medoid divided by minimum distance to another medoid
           1           2           3
        1.538        .772        .448
 
Sample number    5
Objects selected:
108  109  120  140  201  208  210  250  301  304  320  351  401  425
501  502  504  506  508  601  602  603  604  607  701  702  704  711
712  750  808  810   904  909  916 1206 1214 1219 1225 1301 1302 1305
1311 1312 1314 1318

Average distance, initial build =        8.613
Average distance for this sample =       8.613
 
Results for the entire data set
Total distance   =        834.124
Average distance =          8.341
 
Cluster   Size  Medoid    Coordinates of Medoids
 
     1      25     23        55.33       5.17       2.33       6.00
     2      33      5        45.00       9.00        .00       5.00
     3      42     44        70.00       6.14       1.14       5.00
 
Average distance to each medoid
           1           2           3
        6.785       8.858       8.862
 
Maximum distance to each medoid
           1           2           3
       12.492      29.051      27.442
 
Maximum distance to a medoid divided by minimum distance to another medoid
           1           2           3
        1.105       2.570       1.856
 
 
Final results

Sample number 3 was selected, with objects:
108  117  120  136  201  205  210  250  303  304  307  309  310  320
501  502  503  504  505  506  601  604  606  610  702  703  706  711
715  750  803  809  813  904  907  908  909  916 1206 1208 1219 1224
1227 1301 1305 1314
 
Average distance for the entire dataset:       8.208

Clustering vector
  1 1 2 1 1 2 3 1 1 2 1 2 1 3 1 1 1 1 2 1 3 1 2 3 1 2 2 1 1 1 1 1 1
  2 1 3 1 1 1 1 1 2 2 2 2 2 2 1 2 1 2 2 1 2 1 1 1 2 2 1 1 2 3 1 3 3
  1 2 1 2 1 2 2 2 1 1 3 2 1 2 2 1 2 2 3 3 3 2 3 3 2 3 2 3 2 1 3 3 3
  2
 
Cluster   Size  Medoid    Objects
 
     1     44    502
                      101  108  117  120  140  201  205  209  250  301  303
                      304  307  310  351  419  425  427  430  501  502  504
                      506  507  508  510  601  608  610  703  706  708  710
                      715  750  803  808  810  814  908  909  913 1204 1311
 
     2     37   1206
                      109  126  204  208  306  320  401  413  503  602  603
                      604  605  606  607  609  701  702  704  711  712  801
                      809  813  902  904  907  911  916 1203 1206 1207 1224
                     1301 1305 1308 1401

     3     19    210
                      136  210  309  322  505  802  805  806  910 1208 1214
                      1219 1225 1227 1302 1307 1312 1314 1318

Average distance to each medoid
           1           2           3
        9.306       6.427       9.132
 
Maximum distance to each medoid
           1           2           3
       29.104      10.805      16.742
 
Maximum distance to a medoid divided by minimum distance to another medoid
           1           2           3
        1.727      .641        .796


Silhouettes for the selected sample
 
                       0   .1   .2   .3   .4   .5   .6   .7   .8   .9    1
  CLU  NEIG  S(I)   I  +----+----+----+----+----+----+----+----+----+----+
    1    2   .54    304|**************************                       |
    1    2   .52    120|**************************                       |
    1    2   .51    310|*************************                        |
    1    2   .50    750|*************************                        |
    1    2   .49    501|************************                         |
    1    2   .49    502|************************                         |
    1    2   .48    303|************************                         |
    1    2   .45    205|**********************                           |
    1    2   .45    908|**********************                           |
    1    2   .44    108|**********************                           |
    1    2   .44    706|*********************                            |
    1    2   .43    601|*********************                            |
    1    2   .40    117|********************                             |
    1    2   .40    201|********************                             |
    1    2   .39    250|*******************                              |
    1    2   .38    307|******************                               |
    1    2   .35    504|*****************                                |
    1    2   .33    506|****************                                 |
    1    2   .28    909|**************                                   |
    1    2   .13    715|******                                           |
    1    2   .02    703|*                                                |
    1    2  -.04    610|                                                 |
    1    2  -.09    803|                                                 |
                       |                                                 |
    2    3   .71    813|***********************************              |
    2    1   .70    809|***********************************              |
    2    1   .70   1206|***********************************              |
    2    3   .68    904|**********************************               |
    2    3   .64    916|*******************************                  |
    2    1   .57    907|****************************                     |
    2    1   .56    606|****************************                     |
    2    3   .55    711|***************************                      |
    2    3   .52    503|**************************                       |
    2    3   .51   1224|*************************                        |
    2    3   .47    604|***********************                          |
    2    1   .43    702|*********************                            |
    2    1   .26   1301|************                                     |
    2    1   .22    320|***********                                      |
    2    3   .22   1305|**********                                       |
                       |                                                 |
    3    2   .61   1314|******************************                   |
    3    2   .60   1227|******************************                   |
    3    2   .55    505|***************************                      |
    3    2   .55    210|***************************                      |
    3    2   .47    136|***********************                          |
    3    2   .26   1208|*************                                    |
    3    2   .26    309|*************                                    |
    3    2  -.15   1219|                                                 |
                       +----+----+----+----+----+----+----+----+----+----+
                       0   .1   .2   .3   .4   .5   .6   .7   .8   .9    1


        Cluster 1 has average silhouette width      .36
        Cluster 2 has average silhouette width      .52
        Cluster 3 has average silhouette width      .39
 
 For the selected sample, the average silhouette width is .42
 which indicates a strong structure was found.
INTERPRETATION

IDAMS reports analysis specifications:

No. of object : 100

No. of Variables : 4

No. of Cluster : 3

No. of medoids : 3 (i.e. equal to the number of clusters)

 

The algorithm draws 5 random samples of 40 + 3k objects, where k = the number of samples (= 46 objects)

 

(a) Sample 1: Results for sample 1: List of objects in the sample. The average distance from BUILD (initial average distance) = 7.519. The average distance from SWAP (i.e. final average distance) = 7.515

These values are the average distances between each object of the sample and its most similar representative object.

Results for the entire data set

Total distance = 830.364

Average distance = 8.304

Following information for each cluster:

  • Size of the cluster in the entire data set.
  • Its most representative object (medoid)
  • The coordinates of the medoid.
  • Average distance to each medoid.
  • Maximum distance to each medoid.
  • Maximum distance to a medoid divided by the minimum distance of the medoid to another medoid. This value gives on idea of the isolation of the cluster.

Similar information for each of the remaining four samples.

Average distance for the entire data set by BUILD.

Average distance
Sample 1 8.304
Sample 2 8.261
Sample 3 7.916
Sample 4 8.671
Sample 5 8.341

Sample 3 has the lowest average distance. None of these cluster is compact. This observation is also vindicated by the silhouette plot and the value of the silhouette coefficient (i.e. average silhouette width, which is only 0.42).

 

Final Results

The list of objects in the selected sample. Average distance for the entire data set = 8.208.

Clustering Vector is interpreted as follows:

First two objects belong to Cluster 1. The third object belongs to Cluster 2. The fourth and fifth research objects belong to Cluster 3. The sixth object belongs to Cluster 2, and so on..

 

Clustering characteristics of the final partition of the data set.

For each cluster, the following information is pointed:

  • Cluster #
  • Size (no. of objects in the cluster)
  • List of objects.
  • Average distance of objects in a cluster to their medoid.
  • Maximum distance of objects in a cluster to their medoid.
  • Maximum distance to a medoid divided by the minimum distance to another medoid. Cluster 1 has the highest value, while Cluster 3 has the lowest value, which implies that Cluster 2 is tighter than Cluster 1 and Cluster 3. Note, that all the values of this ratio are greater than 0.5. Hence, the
 

Silhouette plot

Cluster 1

The silhouette value of objects 703, 610 and 803 are close to zero, implying that these objects are arbitrarily assigned to cluster 1.

Cluster 2

None of the objects have values close to zero.

Cluster 3

Object (1219) has negative silhouette value = -.1219, which is close to zero. This implies that this object is arbitrarily assigned to its cluster.

Average silhouette width of cluster 2 is larger that that of the other two clusters.

Average silhouette width = 0.42 which implies that clustering structure is weak and could possibly be artificial.

7(4) Example of Hierarchical Agglomerative Cluster Analysis

Research Question

:

Classification of eleven countries according to their publication pattern in different sub fields of chemistry

Methodology

:

Cluster Analysis using the algorithm Agnes (Agglomerative Nesting)

Dataset

:

CHEM.DAT
SYNTAX
$RUN CLUSFIND
$FILES
PRINT = AGNES.LST
DICTIN = CHEM.DIC
DATAIN = CHEM.DAT
$SETUP
LUSFIND PROGRAM AGNES
BADDATA=MD1 -
 STANDARDIZE -
 IDVAR=V1 -
 VARS=(V2-V10) -
 ANALYSIS=AGNES-
 PRINT=(DICT,DISSIM, GRAPH, TRACE,VNAM) 
EXTRACT FROM COMPUTER OUTPUT

After filtering 11 cases read from the input data file

Number of variables: 9

Number of objects: 11

 

*** Dissimilarity matrix ***

x 1 2 3 4 5 6 7 8 9 10 11
1 .00                    
2 4.04 .00                  
3 8.80 7.86 .00                
4 3.20 4.77 6.78 .00              
5 4.15 5.45 7.19 2.12 .00            
6 3.90 3.75 6.12 3.32 4.12 .00          
7 6.35 5.00 7.58 4.68 4.22 5.87 .00        
8 1.08 4.00 8.32 2.92 3.43 3.65 5.86 .00      
9 4.83 5.67 6.66 1.98 2.96 4.67 4.03 4.60 .00    
10 2.56 4.55 8.36 3.21 3.75 3.40 5.97 2.19 4.67 .00  
11 6.48 6.54 10.06 7.40 6.94 7.52 7.14 5.80 8.01 5.85 .00
 

Final ordering of objects and dissimilarities between them

Objects
1
8
10
6
4
9
Dissimiliarities
1.082
2.373
3.650
3.905
1.983
 

 

           

Objects

9

5

2

7
11
3
Dissimiliarities
2.542
4.604
5.247
6.854
7.772
 

1

 
Dissimilarity banner  

                              Dissimilarity banner
 
 0    .08   .16   .24   .32   .40   .48   .56   .64   .72   .80   .88   .96 1
 +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+--+--
             1-  1-  1-  1-  1-  1-  1-  1-  1-  1-  1-  1-  1-  1-  1-  1-  1-
            ********************************************************************
             8-  8-  8-  8-  8-  8-  8-  8-  8-  8-  8-  8-  8-  8-  8-  8-  8-
                       ********************************************************
                        10- 10- 10- 10- 10- 10- 10- 10- 10- 10- 10- 10- 10- 10-
                                    *******************************************
                                      6-  6-  6-  6-  6-  6-  6-  6-  6-  6-  6
                                      *****************************************
                      4-  4-  4-  4-  4-  4-  4-  4-  4-  4-  4-  4-  4-  4-  4
                    ***********************************************************
                      9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9
                         ******************************************************
                           5-  5-  5-  5-  5-  5-  5-  5-  5-  5-  5-  5-  5-
                                             **********************************
                                               2-  2-  2-  2-  2-  2-  2-  2-
                                                   ****************************
                                                     7-  7-  7-  7-  7-  7-  7-
                                                                   ************
                                                                    11- 11- 11-
                                                                            ***
                                                                              3
 +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+--+--
 0    .08   .16   .24   .32   .40   .48   .56   .64   .72   .80   .88   .96 1
 
 
 The actual highest level is 7.772362
 
 The agglomerative coefficient of this data set is .54
INTERPRETATION

IDAMS reports analysis specifications.

No. of Variables = 9

No. of Objects = 11

 

Dissimilarity matrix

This is the matrix of normalized Euclidean distances among the objects.

It can be easily seen that dissimilarity between objects 1 and 8

Dissimilarity (1, 8) = 1.08

is the lowest

Dissimilarity (11,3) = 10.06

is the highest.

 

Final ordering of objects and dissimilarity

First row shows the object identifiers, while the second row shows the dissimilarity between objects.

Row I: 1 8 10 6 4 9 5 2 7 11 3

Row II: 1.082 2.373 3.650 3.905 1.983 2.542 4.604 5.247 6.854 7.772

The numbers in the first row indicate the order for listing the objects. In the second row, we find that the smallest value is 1.082, which is directly under (1) and (8). This means that (1) and (8) will be joined first at level 1.082.

The second smallest value is 1.983 under (4) and (9), which means that (4) and (9) will be joined next.

The third smallest value is 2.373, which is directly under (8) and (10). Recollect that (8) had already joined with (1). This means merger of (10) with (1, 8).

The fourth smallest value is 2.542, which is directly under (9) and (5). Recall that (9) had already merged with (4). This means merger of (5) with (9, 4).

The fifth smallest value is 3.650 which is directly under (10) and (5). Recall that (10) had already merged with (8) and (5), which had already merged with (9). This means merger of (10.5) with (5.9) and (10, 8).

The sixth smallest value is 3.985 which is directly under (10) and (6). Recall that (10) had already merged with (8) and (5). This means merger of ( 6 ) with [ (10, 8) and (10, 5) ] , and so on.

The highest value is 7.72, which is directly under (11, 3).. This means that objects (11, 3) merges in the last step.

The entire hierarchy of objects can be described by two sequences of length 11 and 10 i.e. n and (n-1), where n is the number of objects.

 

Dissimilarity Banner

The banner shows successive merges from left to right. The objects are listed from top to bottom. The banner consists of stars and stripes. The stars indicate linking of objects and stripes are repetitions of the labels of the objects. There are fixed scales above and below the banner, going from 0 to 1 with steps of size .80. Here, 0 indicates a dissimilarity of 0 and 1 stands for the largest dissimilarity encountered, i.e.., the largest dissimilarity encountered at the last step of the merger. Note that the largest dissimilarity value is 7.774. The approximate level for a merger can be easily estimated from the banner plot. For example, object (10) merges with (8) at about 0.28, which is approximately equal to 0.31´ 7.772 = 2.4.

The white space in the left part of the banner indicates the original stage where all objects are separate entities. Then at level » .16, we observe the beginning of the strip comprising 1-1-1-….(object 1) and later on 8-8-8-….(Object 8), and so on.

The overall width of the banner is very important, because it gives an idea of the amount of structure that has been found by the clustering algorithm.

Agglomerative coefficient computed by the program = 0.54. This coefficient is equal to the sum of the lengths of the lines.,divided by the number of objects (Note that each object is represented by one line). Agglomerative coefficient is simply the average width of the banner (or a fraction of the blackness of the banner).

Agglomerative coefficient, AC = 0.54, which indicates that a reasonable configuration has been found

7(5) Example of Hierarchical Divisive Cluster Analysis

Research Question

:

Classification of eleven countries according to their publication pattern in different sub fields of chemistry

Methodology

:

Cluster Analysis using the algorithm Diana (Divisive Analysis)

Dataset

:

CHEM.DAT
SYNTAX*
$RUN CLUSFIND
$FILES
PRINT = DIANA.LST
DICTIN = CHEM.DIC
DATAIN = CHEM.DAT
$SETUP
CLUSTERING WITH PROGRAM DIANA
BADDATA=MD1 -
 STANDARDIZE -
 IDVAR=V1 -
 VARS=(V2-V10) -
 ANALYSIS=DIANA-
 PRINT=(DICT,DISSIM, GRAPH, TRACE,VNAM)
---------

Note: All options set at default values.

EXTRACT FROM COMPUTER OUTPUT

After filtering 11 cases read from the input data file

Number of variables: 9

Number of objects: 11

 

*** Dissimilarity matrix ***

x 1 2 3 4 5 6 7 8 9 10 11
1 .00                    
2 4.04 .00                  
3 8.80 7.86 .00                
4 3.20 4.77 6.78 .00              
5 4.15 5.45 7.19 2.12 .00            
6 3.90 3.75 6.12 3.32 4.12 .00          
7 6.35 5.00 7.58 4.68 4.22 5.87 .00        
8 1.08 4.00 8.32 2.92 3.43 3.65 5.86 .00      
9 4.83 5.67 6.66 1.98 2.96 4.67 4.03 4.60 .00    
10 2.56 4.55 8.36 3.21 3.75 3.40 5.97 2.19 4.67 .00  
11 6.48 6.54 10.06 7.40 6.94 7.52 7.14 5.80 8.01 5.85 .00

At the first step the 11 objects are divided into groups of 10 and 1

 

Final ordering of objects and diameters of the clusters

 Objects      1           8          10           6            2           4         
 Diameters      1.082    2.560      3.904     4.546       6.352   

 Objects      9           5           7           11            3
 Diameters      2.963     4.679    8.014      10.060
 

Dissimilarity banner

   1    .92   .84   .76   .68   .60   .52   .44   .36   .28   .20   .12   .04 0
 --+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+--+
   1-  1-  1-  1-  1-  1-  1-  1-  1-  1-  1-  1-  1-  1-  1-  1-  1-
 *********************************************************************
   8-  8-  8-  8-  8-  8-  8-  8-  8-  8-  8-  8-  8-  8-  8-  8-  8-
 **********************************************************
  10- 10- 10- 10- 10- 10- 10- 10- 10- 10- 10- 10- 10- 10- 1
 ************************************************
   6-  6-  6-  6-  6-  6-  6-  6-  6-  6-  6-  6-
 ********************************************
   2-  2-  2-  2-  2-  2-  2-  2-  2-  2-  2-
 ******************************
   4-  4-  4-  4-  4-  4-  4-  4-  4-  4-  4-  4-  4-  4-  4-  4
 ***************************************************************
   9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9
 *******************************************************
   5-  5-  5-  5-  5-  5-  5-  5-  5-  5-  5-  5-  5-  5
 *******************************************
   7-  7-  7-  7-  7-  7-  7-  7-  7-  7-  7
 ******************
  11- 11- 11- 11- 1
 ***
   3
 --+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+--+
   1    .92   .84   .76   .68   .60   .52   .44   .36   .28   .20   .12   .04 0
 

The actual diameter of this data set is 10.060

The divisive coefficient of this data set is .61

INTERPRETATION

IDAMS reports analysis specifications.

No of cases read from the input data file =11
No. of variables = 9     No. of Objects = 11

 

Dissimilarity matrix

This is a matrix of normalized Euclidean distances between the objects.

It can be easily seen that dissimilarity between objects 1 and 8

Dissimilarity (1,8) = 1.08 is the lowest
Dissimilarity (11,3) = 10.06 is the largest.

 

Final ordering of objects and diameters of the clusters

     Objects              1       8       10      6      2      4
      Diameters     1.082   2.560   3.904   4.546  6.352
      Objects             9       5       7       11     3
      Diameters     2.963   4.679   8.014   10.060

The largest diameter is 10.060, which stands between (11) and (3). This means that the whole data would be split at the level 10.060, yielding, a singleton cluster with the object (3) on the left and a cluster with objects (1, 8, 10, 6, 2, 4, 9, 5, 7, 11) on the right. Thus in the first step, we get two clusters:

(3)
(1, 8, 10, 6, 2, 4, 9, 5, 7, 11)

The second largest diameter is 8.014, which stands between (7) and (11). This means that Cluster (1, 8, 10, 6, 2, 4, 9, 5, 7, 11) would be split into two clusters:

(11)
(1, 8, 10, 6, 2, 4, 9, 5, 7, 11).

The third largest diameter is 6.352, which stands between (2) and (4), indicating that Cluster (1, 8, 10, 6, 2, 4, 9, 5, 7, 11) would be divided into two clusters:

(1, 8, 10, 6, 2)
(4, 9, 5, 7).

The fourth largest diameter is 4.679 which stands between (5) and (7). This mean that Cluster (4, 9, 5, 7) would be divided into two clusters:

(4, 9, 5)
(7)

The fifth largest diameter is 4.546 which stands between (6) and (2). This means that Cluster (1, 8, 10, 6, 2) would be divided into two clusters:

(1, 8, 10, 6)
(2).

The sixth largest diameter is 3.904, which stands between (10) and (6). This means that Cluster (1, 8, 10, 6) should be divided into two clusters:

(1, 8, 10)
(6).

This divisive process is continued till we get 11 singleton clusters.

 

Dissimilarity Banner

The dissimilarity banner is similar to that of Agnes (Example 7.4), but it floats in the opposite direction. Also, the scales that surround the banner are plotted differently, since they decrease from 1 to 0. Here, 0 indicates a zero diameter and 1 stands for the diameter of the entire dataset, which is equal to 10.060. Diameter = 0 corresponds to singletons.

The overall width of the banner reflects the strength of the clustering. When the diameter of the entire data set is much larger than that of the diameter of individual clusters, the banner is wide. The divisive coefficient DC is the average width of the banner.

DC = 0.61

which indicates good clustering.

7(6) Example of Cluster Analysis of Binary Data

Research Question

:

Classification of 33 major academic institutions in India according to priorities given to different scientific fields.

Methodology

:

Cluster Analysis using the algorithm MONA (Monothetic Analysis)

Dataset

:

MONA.DAT
SYNTAX*
$RUN CLUSFIND
$FILES
PRINT = MONA.LST
DICTIN = ACADEMIC.DIC
DATAIN = ACADEMIC.DAT
$SETUP
CLUSTER ANALYSIS USING MONA
BADDATA=MD1 -
IDVAR=V1 -
VARS=(V2-V9) -
ANALYSIS=MONA -
CMAX=5 -
PRINT=(DICT,DISS,GRAPH,TRACE,VNAM)
-------------------------------

Note; All options set at default values.

EXTRACT FROM COMPUTER OUTPUT

After filtering 33 cases read from the input data file

Number of variables: 8

Number of objects: 33

 

Step number 1

Cluster 1 3 4 6 7 8 9 10 12 13 18 23 25 26 32 2 5 11
         14 15 16 17 19 20 21 22 24 27 28 29 30 31 33

is divided into 15 and 18 objects, using variable LIF

Step number 2

Cluster 1 3 4 7 8 10 12 13 18 23 26 32 6 9 25
is divided into 12 and 3 objects, using variable MED
Cluster 2 14 15 16 19 20 21 27 28 29 30 31 33 5 11 17 22 24
is divided into 13 and 5 objects, using variable ESP

Step number 3

Cluster 1 3 4 7 8 10 13 18 23 12 26 32
is divided into 9 and 3 objects, using variable MAT
Cluster 6 9 25
is divided into 2 and 1 objects, using variable MAT
Cluster 2 16 20 21 28 29 30 31 33 14 15 19 27
is divided into 9 and 4 objects, using variable MED
Cluster 5 17 22 24 11
is divided into 4 and 1 objects, using variable PHY

Step number 4

Cluster 1 3 4 7 8 10 13 18 23
is divided into 5 and 4 objects, using variable ESP
Cluster 12 32 26
is divided into 2 and 1 objects, using variable PHY
Cluster 6 9
Cannot be separated by the remaining variables.
Cluster 2 16 20 29 21 28 30 31 33
is divided into 4 and 5 objects, using variable PHY
Cluster 14 15 27 19
is divided into 3 and 1 objects, using variable PHY
Cluster 5 22 24 17
is divided into 3 and 1 objects, using variable MAT

Step number 5

Cluster 1 3 7 8 4
is divided into 4 and 1 objects, using variable CHE
Cluster 10 23 13 18
is divided into 2 and 2 objects, using variable ENG
Cluster 12 32
Cannot be separated by the remaining variables.
Cluster 2 29 16 20
is divided into 2 and 2 objects, using variable CHE
Cluster 21 28 30 31 33
is divided into 1 and 4 objects, using variable CHE
Cluster 14 27 15
is divided into 2 and 1 objects, using variable CHE
Cluster 5 22 24

is divided into 2 and 1 objects, using variable CHE

Step number 6

Cluster 1 3 7 8
Cannot be separated by the remaining variables.
Cluster 10 23
Cannot be separated by the remaining variables.
Cluster 13 18
Cannot be separated by the remaining variables.
Cluster 2 29
is divided into 1 and 1 objects, using variable ENG
Cluster 16 20
Cannot be separated by the remaining variables.
1
Cluster 28 30 33 31
is divided into 3 and 1 objects, using variable MAT
Cluster 14 27
is divided into 1 and 1 objects, using variable AGR
Cluster 5 22
is divided into 1 and 1 objects, using variable MED

Step number 7

Cluster 28 30 33
is divided into 1 and 2 objects, using variable AGR

Step number 8

Cluster 30 33
Cannot be separated by the remaining variables. 
 

Final Ordering of Objects

       1     3     7     8     4    10    23    13    18    12    32    26     6
 Step              5     4           5           3           4     2
  By             CHE   ESP         ENG         MAT         PHY   MED
 
       6     9    25     2    29    16    20    21    28    30    33    31    14
 Step  3     1     6     5           4     5     7           6     3
  By  MAT   LIF   ENG   CHE         PHY   CHE   AGR         MAT   MED
 
               14    27    15    19     5    22    24    17    11
  Step          6     5     4     2     6     5     4     3
  By          AGR   CHE   PHY   ESP   MED   CHE   MAT   PHY
 

Separation Plot

        0        1         2         3         4         5         6         7
        1-  1-  1-  1-  1-  1-  1-  1-  1-  1-  1-  1-  1-
        3-  3-  3-  3-  3-  3-  3-  3-  3-  3-  3-  3-  3-
        7-  7-  7-  7-  7-  7-  7-  7-  7-  7-  7-  7-  7-
        8-  8-  8-  8-  8-  8-  8-  8-  8-  8-  8-  8-  8-
  CHE ****************************************************
        4-  4-  4-  4-  4-  4-  4-  4-  4-  4-  4-  4-  4-
  ESP ******************************************
       10- 10- 10- 10- 10- 10- 10- 10- 10- 10- 10- 10- 10-
       23- 23- 23- 23- 23- 23- 23- 23- 23- 23- 23- 23- 23-
  ENG ****************************************************
       13- 13- 13- 13- 13- 13- 13- 13- 13- 13- 13- 13- 13-
       18- 18- 18- 18- 18- 18- 18- 18- 18- 18- 18- 18- 18-
  MAT ********************************
       12- 12- 12- 12- 12- 12- 12- 12- 12- 12- 1
       32- 32- 32- 32- 32- 32- 32- 32- 32- 32- 3
  PHY ******************************************
       26- 26- 26- 26- 26- 26- 26- 26- 26- 26- 2
  MED **********************
        6-  6-  6-  6-  6-  6-  6-  6-
        9-  9-  9-  9-  9-  9-  9-  9-
  MAT ********************************
       25- 25- 25- 25- 25- 25- 25- 25-
  LIF ************
        2-  2-  2-  2-  2-  2-  2-  2-  2-  2-  2-  2-  2-  2-  2-
  ENG **************************************************************
       29- 29- 29- 29- 29- 29- 29- 29- 29- 29- 29- 29- 29- 29- 29- 2
  CHE ****************************************************
       16- 16- 16- 16- 16- 16- 16- 16- 16- 16- 16- 16- 16-
       20- 20- 20- 20- 20- 20- 20- 20- 20- 20- 20- 20- 20-
  PHY ******************************************
       21- 21- 21- 21- 21- 21- 21- 21- 21- 21- 21- 21- 21-
  CHE ****************************************************
       28- 28- 28- 28- 28- 28- 28- 28- 28- 28- 28- 28- 28- 28- 28- 28- 28- 28-
  AGR ************************************************************************
       30- 30- 30- 30- 30- 30- 30- 30- 30- 30- 30- 30- 30- 30- 30- 30- 30- 30-
       33- 33- 33- 33- 33- 33- 33- 33- 33- 33- 33- 33- 33- 33- 33- 33- 33- 33-
  MAT **************************************************************
       31- 31- 31- 31- 31- 31- 31- 31- 31- 31- 31- 31- 31- 31- 31- 3
  MED ********************************
       14- 14- 14- 14- 14- 14- 14- 14- 14- 14- 14- 14- 14- 14- 14- 1
  AGR **************************************************************
       27- 27- 27- 27- 27- 27- 27- 27- 27- 27- 27- 27- 27- 27- 27- 2
  CHE ****************************************************
       15- 15- 15- 15- 15- 15- 15- 15- 15- 15- 15- 15- 15-
  PHY ******************************************
       19- 19- 19- 19- 19- 19- 19- 19- 19- 19- 1
  ESP **********************
        5-  5-  5-  5-  5-  5-  5-  5-  5-  5-  5-  5-  5-  5-  5-
  MED **************************************************************
       22- 22- 22- 22- 22- 22- 22- 22- 22- 22- 22- 22- 22- 22- 22- 2
  CHE ****************************************************
       24- 24- 24- 24- 24- 24- 24- 24- 24- 24- 24- 24- 24-
  MAT ******************************************
       17- 17- 17- 17- 17- 17- 17- 17- 17- 17- 1
  PHY ********************************
       11- 11- 11- 11- 11- 11- 11- 11-
        0        1         2         3         4         5         6         7
INTERPRETATION

IDAMS reports analysis specifications

No of cases read from the input data file = 33
No. of Variables = 8
No. of Objects = 33

 

The whole sample is successively divided into clusters in eight steps. The results are represented in Figure 1 in the form of a hierarchical tree. Note that in monothetic analysis, only one variable is taken at a time for hierarchical clustering of objects. Hence, the name Monothetic clustering.

In the tree, the variable used for a split is indicated in the figure.

 

Final Ordering of Objects

The first row shows the sequence of objects and the second row shows the separation steps. For example, the first step appears between (2) and (25). All the objects starting from (25) are separated at the first step from the other objects. This separation is carried out, using the variable LIFE.

The second step appears between (26) and (6). The objects starting from (6) to (25) are separated at step 2. This separation is carried out using the variable MED, and so on.

 

Separation Banner

Each object of the data set corresponds to a horizontal line in the banner. The horizontal lines are ordered in the same way as the first row of the final ordering of objects. The end of a row of stars **** indicates a separation between clusters. If two are more lines representing objects are stuck together, it means that the objects cannot be separated. Objects (1, 3, 7, 8) are stuck together and hence cannot be further split. The length of the row of stars is proportional to the step number at which the separation is carried out. These objects were separated from the other object (4) at step 5, using the variable CHE.

When the row of an object does not continue to the right hand side of the banner, it means that at the corresponding step it becomes a singleton cluster. For example, object (11) becomes a singleton at step 3, using the variable PHY.

It is important to note that the banner of MONA cannot be used to assess the quality of clustering, because the length of the row of stars is proportioned to the number of the separation step, and not to the tightness of the clusters

7(7) Example of Construction of a Typology

Research Question

:

Construct a typology of academic scientists according to the pattern of their involvement in different activities, and identify their main characteristics.

Methodology

:

Classification ,f using the IDAMS module Typol.

Dataset

:

TYPE.DAT
SYNTAX*
$RUN TYPOL
$FILES
PRINT = typology.lst
DICTIN = anju.dic
DATAIN = anju.dat
$SETUP
EXCLUDE V1=1220,2049,2055,2074,2075,5016
Activity profile of academic scientists
BADDATA=MD1 -
 AQNTVARS=(V2-V8) -
 PQNTVARS=(V13)-
 PQLTVARS=(V9,v10,v14, v15) -
 INITIAL=RANDOM NCASES=1073 -
 DTYPE=EUCLID -
 INIGROUP=5 -
 FINGROUP=3-
 PRINT=(CDICT,GRAP,ROWP,DIST)
--------------------------------------------

Note: All options set at default values

EXTRACT FROM COMPUTER OUTPUT

Number of initial groups: 5
Number of final groups: 3

Initial configuration is a random sample
Number of cases: 1073
Maximum number of iterations: 5

No standardization of active variables

Type of distance is 'Euclidean'

Regrouping is based on minimum displacement

Print the graphic of profiles

Print row % for qual. variables categories

Print table of distances and displacements for each regrouping
Print all resulting typologies

Active quantitative variables
     V2 V3 V4 V5 V6 V7 V8

Passive quantitative variables
     V13

After filtering 1067 cases read from the input file
4 cases contained illegal characters and were treated according to BADDATA specification

The distances and displacements are computed on non-standardized variables

 

% of explained variance from one iteration to another

Iteration number      Mean EV  Image
         1                         .345  ***
         2               .          358  ****
         3               .          359  ****
 

Characteristics of distances by groups

     Group no.      N       Mean       SD 
          1              217.    58.844    81.419
          2                94.    73.033    71.358
          3              204.    39.833    39.680
          4              174.    68.739    81.807
          5              354.    41.761    43.044
1     Total count      Mean      SD    

    1043.          52.257    63.657
 
Var seq          Name                Mean           S. D.          Weight
     1    v262:teaching             42.02           18.00           1.00
     2    v263:research             22.40           12.28           1.00
     3    v264:supervsn             16.14           12.01           1.00
     4    v265:lab-dev               5.76            7.04           1.00
     5    v266:admin                 7.78            9.30           1.00
     6    v267:extension             2.94            6.30           1.00
     7    v268:profess               2.96            4.23           1.00
     8    v344:#doc students         2.00            1.96            .00
 

Description of resulting typology

                            Group number                 1     2      3      4      5
   Total cases       1000   Proportion of cases        208     90    195    166    339
       Explained     Grand
        variance      mean


    1 767 ********   42.02 v262:teaching              25.63  23.28  68.32  31.83  46.88
                                                       8.12   9.55   9.63  10.90   6.80


    2 464 *****      22.40 v263:research              22.09  16.30  17.19  40.63  18.26
                                                       8.76   7.06   9.14  12.57   7.22


    3 519 *****      16.14 v264:supervsn              30.88  14.22   5.09  10.48  16.77
                                                      11.14   6.85   5.26   8.20   8.17


    4  83 *           5.76 v265:lab-dev                5.50  10.96   2.99   5.25   6.38
                                                       5.76  12.12   4.11   5.97   6.88


    5 564 ******      7.78 v266:admin                  6.50  29.74   3.74   5.45   6.21
                                                       4.94  11.70   5.12   5.21   5.59


    6  23             2.94 v267:extension              4.50   2.12   1.87   3.51   2.54
                                                       8.28   4.00   5.13   7.32   5.16


    7  96 *           2.96 v268:profess                4.91   3.38    .80   2.86   2.94
                                                       4.24   4.22   2.08   4.09   4.60


    8 164 **          2.00 v344:#doc students          3.24   2.04    .75   1.79   2.05
                                                       1.98   1.69   1.36   2.06   1.77


    9  70 *           34.2 v204:rank      CODE=0001   48.4   54.3   14.2   33.9   31.9
                     100.0                            29.4   14.3    8.1   16.4   31.6


   10   3             34.5 v204:rank      CODE=0002   36.4   27.7   33.3   32.8   36.7
                     100.0                            21.9    7.2   18.8   15.8   36.1


   11  90 *           29.5 v204:rank      CODE=0003   12.0   14.9   52.5   32.2   29.7
                     100.0                             8.4    4.5   34.6   18.1   34.1

   12  75 *           10.3 v217:head?     CODE=0001    8.8   36.2    5.9    5.2    9.3
                     100.0                            17.8   31.7   11.2    8.4   30.8

   13  76 *           88.4 v217:head?     CODE=0002   90.8   60.6   92.6   93.1   89.5
                     100.0                            21.4    6.2   20.4   17.5   34.3

   14  63 *           35.1 sv:inst type   CODE=0001   21.7   22.3   53.4   25.9   40.7
                     100.0                            12.8    5.7   29.7   12.2   39.3

   15  59 *           28.3 sv:inst type   CODE=0002   32.7   47.9    8.3   31.6   30.2
                     100.                             24.1   15.2    5.7   18.6   36.2

   16 110 *            5.7 sv:inst type   CODE=0003   19.4    1.1     .0    8.6     .3
                     100.0                            71.2    1.7     .0   25.3    1.7

   17   9             31.0 sv:inst type   CODE=0004   26.3   28.7   38.2   33.9   28.8
                     100.0                            17.6    8.3   24.1   18.2   31.5

   18   1             59.3 :field         CODE=0001   61.8   55.3   57.4   59.2   59.9
                     100.0                            21.7    8.4   18.9   16.6   34.3

   19   1             39.3 :field         CODE=0002   37.8   42.6   40.2   39.7   38.7
                     100.0                            20.0    9.7   19.9   16.7   33.4
 

Variables explaining 80% of the variance

    Var seq          Names            	   Expl. Var
         1          v262:teaching              767
         5          v266:admin                 564
         3          v264:supervsn              519
         2          v263:research              464

Expl. Var = amount of variance explained by one variable
Total variance = overall variance explained by the active variables
Mean variance explained by active variables = 335
Mean variance explained by all variables = 239
Mean variance explained by the variables which explain 80% of the total variance = 578
Percentage of variables = 21.1
Mean variance explained by those variables which explain 80% of the total variance before regrouping = 578

 

Displacements

Square roots of (computed on weighted variables)
and distances

   Groups
   Numbers       1       2       3       4
      2        37.7
                3.362
      3        62.0    49.9
                4.368   4.496
      4        44.6    40.0    54.6
                3.274   3.694   4.072
      5        50.3    42.7    48.3    48.4
                3.132   3.574   3.068   3.238
          Regrouping number   1
              Group      2 is incorporated into group   1
              Displacement =      1421.829
              Distance     =        11.305
 
                            Group number                1      3      4      5
   Total cases       1000   Proportion of cases        298    195    166    339
       Explained     Grand
        variance      mean


    1 765 ********    42.02 v262:teaching              24.92  68.32  31.83  46.88
                                                        8.64   9.63  10.90   6.80
    9  69 *           34.2 v204:rank CODE=0001         50.2   14.2   33.9   31.9
                     100.0                             43.7    8.1   16.4   31.6

   10   1             34.5 v204:rank      CODE=0002    33.8   33.3   32.8   36.7
                     100.0                             29.1   18.8   15.8   36.1

   11  90 *           29.5 v204:rank      CODE=0003    12.9   52.5   32.2   29.7
                     100.0                             13.0   34.6   18.1   34.1
 

Group 1

 Var    EV    Mean      -2.5      -2.0      -1.5      -1.0      -0.5        0        0.5       1.0       1.5       2.0       2.5
 seq                                                                        I
    1  767   25.631                                       X-----------------I
 v262:teaching                                                              I
    2  464   22.092                                                        XI 
 v263:research                                                              I
    3  519   30.876                                                         I------------------------X
 v264:supervsn                                                              I
    4   83    5.498                                                        XI 
 v265:lab-dev                                                               I
    5  564    6.498                                                      X--I
 v266:admin                                                                 I
    6   23    4.498                                                         I----X
 v267:extension                                                             I
    7   96    4.908                                                         I--------X
 v268:profess                                                               I
    8  164    3.240                                                         I------------X
 v344:#doc students                                                         I
    9   70   48.387                                                         I-----X
 v204:rank      CODE=0001                                                   I
   10    3   36.406                                                         IX
 v204:rank      CODE=0002                                                   I
   11   90   11.982                                                 X-------I
 v204:rank      CODE=0003                                                   I
   12   75    8.756                                                        XI 
 v217:head?     CODE=0001                                                   I
   13   76   90.783                                                         IX
 v217:head?     CODE=0002                                                   I
   14   63   21.659                                                   X-----I
 sv:inst type   CODE=0001                                                   I
   15   59   32.719                                                         I-X
 sv:inst type   CODE=0002                                                   I
   16  110   19.355                                                         I-----------X
 sv:inst type   CODE=0003                                                   I
   17    9   26.267                                                       X-I
 sv:inst type   CODE=0004                                                   I
   18    1   61.751                                                         IX
 :field         CODE=0001                                                   I
   19    1   37.788                                                        XI 
 :field         CODE=0002                                                   I
 
.........   
 :       :   
 :       :   
 .....   :   
 :   :   :   
 T 1 T 2 T 4 T
0 208  90 166
INTERPRETATION

IDAMS reports analysis specification

   No. of cases read = 1067
   No. of cases analysed = 1063
   Active quantitative variables = V2-V8
   Passive quantitative variables = V13
   Passive qualitative variables = V9, V10, V14, V15
   (These variables are defined in the file TYPE.DIC)
   No. of initial groups = 5
   No. of final groups = 3
   Distance metric used = Euclidean
   Regrouping based on minimum distance.	

 

History of % of variation explained from the first iteration to the final (i.e. the third iteration)

 

Characteristics of distances by groups

N = The number of cases of each group of the initial typology
Mean = Mean of the distances from the group profile over all cases in the group
SD = Standard deviation of the distance for each group

Total count = Total number of cases participating in the building of the initial typology
SD = Overall standard deviation of distance

 

Mean, S.D. and Weight of quantitative variables
For active quantitative variables, weight = 1
For passive quantitative variables weight = 0
(Note: Passive variables do not participate in the construction of typology).

 

Description of typology

For each variable, the following information is given:
-Serial number
-Variance explained (in permils) = i.e. 1/1000)
-Sequence of stars ***, the number of stars is proportional to the variance explained. This provides a visualization of the importance of a variable in explaining the differences between typology groups.
-Grand mean = Mean value of the variable overall cases.

For each typology group:
Proportion of cases (per thousand)

Quantitative variables
Row 1: Mean value of the variable
Row 2: Standard deviation of the variable

Qualitative variables
For each quantitative variable

  • Variance explained by each category of the variable
  • Percentage of cases in each category of the variable

For each group:

Row 1: Percentage frequency of a given category in the group. This value summed over all categories = 100
Row 2: Distribution of a given category over all the groups. This value summed over all groups = 100

For example: Quantitative Variable V262: Teaching

           Gr1    Gr2    Gr3    Gr4    Gr5	
Row 1:    25.63  23.28  60.32  31.83  46.88
Row 2:     8.62   9.55   9.63  10.90   6.80

The first row shows the average time spent by the members of typology group. The second row shows the standard deviation of this variable for each typology group.

For example: Qualitative Variable V262: Rank

This variable has 3 categories:

Col.1  Col.2        Col.3  Col.4  Col.5  Col.6  Col.7
       Code          Gr1    Gr2    Gr3    Gr4    Gr5


 34.2   1           48.4    54.3  14.2    33.9   31.9
100.0               29.4    14.3   8.1    16.4   31.6


 34.5   2           36.4    27.7  33.3    32.8   36.7
100.0               21.9     7.2  18.8    15.8   36.1


 29.5   3           12.0    14.9  52.5    32.2   29.7
100.0                8.4     4.5  34.6    18.1   34.1

Col. 1 show the distribution of different categories in the entire sample:

34.2% Category 1 (Professor)
34.6% Category 2 (Reader)
29.5% Category 3 (Lecture)

Let us consider Group 1
The composition of Group is

Category 1: Professors : 50.6%
Category 2: Reader : 34.2%
Category 3: Lecture : 12.8

Category 1 is more abundant in Group 1
Category 3 is more deficient in Group 1.
Category 2 has about the same frequency as in the entire sample.

Similar interpretation for other categories.

 

Variables explaining 80% of the variance. This is a list of the most discriminant variables, which taken together account for 80% of the explained variance. These variables are ranked according to their explanatory power.

  • Mean variance explained by active variables = Mean amount of variance explained by the active variables.
  • Mean variance explained by all the variables taken together.
  • Mean amount of variance explained by the most discriminant variables
  • Proportion of variables = 100 ´ No. of most discriminant variables/All variables.
 

Matrix of (square roots of) Inter-group distances and displacements
First row = Square root of distances
Second row = Square root of displacement

It can be easily seen that the distance between Group 1 and Group 2 is minimum.
Hence, these two groups are merged.

 

Description of the resulting 3 - group typology
Similar interpretation as that of the 5 - group typology.

 

Graphical representation of profiles of different groups
Var seq = Sequence number of variables as in the description of the typology
EV = Explained variance
Mean = Mean value of the group.

Vertical line correspondence to the grand mean for all variables.
Horizontal bars show the deviation of the mean value of the variable of a given typology group from the grand mean.

Horizontal bars to the right of the vertical line Þ Value greater than the grand mean. The length of the bar is proportional to the deviation from the grand mean (calibrated in terms of standard deviation).

Horizontal bars to the left of the vertical line Þ Value less than the grand mean. The length of the bar is proportional to the deviation from the grand mean (calibrated in terms of standard deviation).

  Dendrogram showing the mergers of groups. The dendrogram can help in deciding the number of typology groupd retained for interpretation.. Another factor in deciding the number of typology groups is based on the interpretation of typology from theoretical point of view.