|
Research Question |
: |
Classification of 35 major countries according to the pattern of their collaboration with India in different fields of science. |
|
Methodology |
: |
Cluster Analysis using the algorithm PAM (Partitioning Around Medoids) |
|
Dataset |
: |
COOP.DAT |
$RUN CLUSFIND $FILES PRINT = PAM.LST DICTIN = COOP.DIC DATAIN = COOP.DAT $SETUP CLUSTER ANALYSIS USIN PAM BADDATA=MD1 - IDVAR=V1 - VARS=(V2-V12) - ANALYSIS=PAM - CMIN=4 - PRINT=(DICT,DISS,GRAPH,TRACE)
-----------
Note: All options set at default values
|
After filtering 35 cases read from
the input data file |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
*** Dissimilarity matrix ***
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Number of representative objects: 4 Average distance, initial build : 36.916 Cluster Medoid Size Objects
Coordinates of medoids 1 97.00 832.00 290.00 120.00 132.00 58.00 305.00 313.00 188.00 49.00 78.00 3 11.00 387.00 93.00 49.00 39.00 23.00 79.00 94.00 52.00 4.00 16.00 5 4.00 224.00 47.00 15.00 26.00 7.00 21.00 40.00 13.00 6.00 20.00 22 2.00 17.00 7.00 5.00 .00 1.00 7.00 4.00 2.00 1.00 .00 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Clustering vector 1 2 2 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Clustering characteristics Cluster 1 is an isolated singleton object 1 , separation 618.50 Cluster 2 an isolated L*-Cluster with diameter 119.02 and separation 188.28 Number of isolated clusters: 2
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Silhouettes
0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1
CLU NEIG S(I) I +----+----+----+----+----+----+----+----+----+----+
1 2 .00 1| |
| |
2 3 .53 2|************************** |
2 3 .49 3|************************ |
| |
3 4 .59 5|***************************** |
3 4 .50 4|************************* |
3 2 .49 7|************************ |
3 4 .27 6|************* |
3 4 -.04 8| |
| |
4 3 .85 22|****************************************** |
4 3 .85 30|****************************************** |
4 3 .85 26|****************************************** |
4 3 .85 33|****************************************** |
4 3 .85 27|****************************************** |
4 3 .85 29|****************************************** |
4 3 .85 34|****************************************** |
4 3 .85 24|****************************************** |
4 3 .84 35|****************************************** |
4 3 .84 23|****************************************** |
4 3 .84 20|***************************************** |
4 3 .83 28|***************************************** |
4 3 .83 32|***************************************** |
4 3 .83 19|***************************************** |
4 3 .83 18|***************************************** |
4 3 .83 31|***************************************** |
4 3 .81 21|**************************************** |
4 3 .80 15|**************************************** |
4 3 .80 25|**************************************** |
4 3 .78 17|*************************************** |
4 3 .76 16|************************************* |
4 3 .72 12|************************************ |
4 3 .62 9|****************************** |
4 3 .60 14|****************************** |
4 3 .56 13|**************************** |
4 3 .50 11|************************* |
4 3 .20 10|********** |
+----+----+----+----+----+----+----+----+----+----+
0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1
Cluster 1 has average silhouette width 1.00
Cluster 2 has average silhouette width .51
Cluster 3 has average silhouette width .36
Cluster 4 has average silhouette width .76
For the entire data set, the average silhouette width is .70
which indicates a strong structure was found.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
IDAMS reports analysis specification: No. of Objects = 35
|
||
|
|
Dissimilarity matrix is a matrix of unstandardized Euclidean
distances among the objects. No. of medoids (representative objects) = Number of clusters = 4 |
|
|
Average dissimilarity for the solution found in the first part of the
algorithm (BUILD) = 36.916 Following information for each cluster:
Cluster 1: Only one member: Object identified by Idcode (1), which
is naturally its medoid. Idcode: (9) to (35) |
||
|
Clustering Vector: The jth element of this vector is the number of the cluster to which the object j belongs. Cluster numbers are ordered from left to right. Cluster 1 is encountered first, followed by Cluster 2, and so on. Clustering Vector is 1 22 33333 444444444444444444444444444 Thus, object (1) belongs to Cluster 1 |
||
|
Clustering Characteristics Cluster 1 is an isolate singleton object. Cluster 2 is an isolate Lx cluster with Diameter = 113.02 and Separation =108.20 Diameters and separations of the cluster |
||
|
Silhouette Plot The following information is given for each object.
This is the second best cluster for an object. For example, the second best choice for object (3) in Cluster 2 is Cluster 3. The second best choice for object 7 in Cluster 3 is Cluster 2. The second best choice for object (5) in Cluster 2 is Cluster 4.
Silhouette width of objects Value close to 1 implies that the object is well classified. Value close to 0 implies that the object is arbitrarily assigned to the cluster. Values close to –1 indicate that the object is poorly classified.
It can be easily seen from the silhouette plot that objects (1) and (8) have silhouette width close to zero, which implies that these objects have been arbitrarily assigned to their clusters. Leaving aside isolated cluster, Cluster (4) is tighter than other clusters. Silhouette Coefficient ( i.e,. Average silhouette width) = 0.70, which indicates reasonable structure. |
|
Research Question |
: |
Classification of 35 major countries according to the pattern of their collaboration with India in different fields of science. |
|
Methodology |
: |
Cluster Analysis using the algorithm FANNY (Fuzzy Analysis) |
|
Dataset |
: |
COOP.DAT |
$RUN CLUSFIND $FILES PRINT = FANNY.LST DICTIN = COOP.DIC DATAIN = COOP.DAT $SETUP CLUSTER ANALYSIS USIN FANNY BADDATA=MD1 - IDVAR=V1 - VARS=(V2-V12) - ANALYSIS=FANNY- CMIN=4 - PRINT=(DICT,DISS,GRAPH,TRACE)
|
After filtering 35 cases read from the input data file
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
Dissimilarity matrix ***
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Iteration objective function 1 595.4648
2 581.4908
3 577.1000
4 572.3036
5 565.7405
6 558.7579
7 553.5726
8 549.4211
9 545.3386
10 541.7881
11 539.6334
12 538.6904
13 538.3286
14 538.1903
15 538.1367
16 538.1155
17 538.1072
18 538.1038
19 538.1031
20 538.1028
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
*** Fuzzy Clustering *** 1 2 3 4
1 .3930 .2392 .1905 .1772
2 .6938 .1501 .0847 .0714
3 .6793 .1664 .0841 .0702
4 .1127 .6041 .1677 .1155
5 .0777 .7629 .0936 .0658
6 .1212 .4176 .2766 .1846
7 .1209 .6621 .1257 .0914
8 .0800 .3123 .4101 .1976
9 .0453 .1162 .5478 .2908
10 .0618 .2076 .5293 .2012
11 .0376 .1108 .6678 .1838
12 .0373 .0895 .5178 .3554
13 .0315 .0899 .7055 .1731
14 .0281 .0784 .7260 .1674
15 .0244 .0592 .5304 .3861
16 .0218 .0555 .6849 .2378
17 .0232 .0572 .6109 .3088
18 .0189 .0417 .1973 .7422
19 .0182 .0400 .1837 .7581
20 .0170 .0374 .1730 .7726
21 .0245 .0588 .4900 .4266
22 .0114 .0255 .1343 .8287
23 .0159 .0353 .1690 .7798
24 .0133 .0294 .1384 .8189
25 .0291 .0614 .2286 .6809
26 .0140 .0316 .1792 .7753
27 .0098 .0214 .0992 .8696
28 .0159 .0340 .1406 .8096
29 .0096 .0209 .0947 .8748
30 .0090 .0197 .0942 .8771
31 .0163 .0348 .1423 .8066
32 .0161 .0344 .1424 .8070
33 .0128 .0285 .1496 .8091
34 .0149 .0331 .1677 .7843
35 .0184 .0418 .2410 .6987
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Partition coefficient of Dunn = .55 Its normalized version = .40 |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Closest hard clustering Cluster Size Objects
1 3 1 2 3
2 4 4 5 6 7
3 11 8 9 10 11 12 13 14 15 16 17 21
4 17 18 19 20 22 23 24 25 26 27 28 29 30
31 32 33 34 35
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Clustering vector 1 1 1 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Silhouettes 0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1
CLU NEIG S(I) I +----+----+----+----+----+----+----+----+----+----+
1 2 .25 1|************ |
1 2 -.36 2| |
1 2 -.41 3| |
| |
2 3 .55 5|*************************** |
2 3 .49 7|************************ |
2 3 .42 4|********************* |
2 3 .07 6|*** |
| |
3 4 .45 11|********************** |
3 4 .44 13|********************** |
3 4 .44 10|********************* |
3 4 .44 14|********************* |
3 2 .31 8|*************** |
3 4 .16 9|******* |
3 4 .15 16|******* |
3 4 -.06 17| |
3 4 -.08 12| |
3 4 -.23 15| |
3 4 -.32 21| |
| |
4 3 .82 29|***************************************** |
4 3 .81 30|**************************************** |
4 3 .81 27|**************************************** |
4 3 .78 28|*************************************** |
4 3 .78 31|*************************************** |
4 3 .78 32|*************************************** |
4 3 .77 24|************************************** |
4 3 .77 22|************************************** |
4 3 .75 33|************************************* |
4 3 .74 23|************************************* |
4 3 .74 20|************************************* |
4 3 .74 34|************************************* |
4 3 .73 19|************************************ |
4 3 .72 26|*********************************** |
4 3 .71 18|*********************************** |
4 3 .68 25|********************************* |
4 3 .65 35|******************************** |
+----+----+----+----+----+----+----+----+----+----+
0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1
Cluster 1 has average silhouette width -.18
Cluster 2 has average silhouette width .38
Cluster 3 has average silhouette width .15
Cluster 4 has average silhouette width .75
For the entire data set, the average silhouette width is .44
which indicates a strong structure was found.
|
|
IDAMS reports data specifications No. of Objects : 35 No. of Variables : 11 Number of Clusters: 4
Note that Fanny does not use any representative objects (medoids). Instead, the algorithm attempts to minimize the objective function defined earlier. The objective function is really a kind of total dispersion. The algorithm needed 30 iterations for convergence. |
||
|
|
Dissimilarity matrix: Matrix of Euclidean distances between the objects | |
|
Objective Function |
||
|
Membership coefficient of different objects. |
||
|
Dunn’s partition coefficient =
0.55 Note: Normalized Dunn’s Coefficient value: 0 Þ Completely fuzzy clustering |
||
|
Closet hard cluster: |
||
|
Clustering vector The first three objects belong
to Cluster 1. |
||
|
Silhouette plot of the closest hard clustering. Objects (2) and (3) of Cluster 1 and objects (15) and (21) of Cluster 3 have negative silhouette width. These objects are poorly classified. Objects (17) and (12) of Cluster 3 have zero silhouette width. Thes objects are arbitrarily assigned to their cluster. These objects could also have been assigned to their neighbor (viz. Cluster 4). The silhouette widths of objects in Cluster 4 are larger than those of objects of other clusters, which implies that they are better classified than their counterparts in other clusters.. Average silhouette width of Cluster 1 is negative (–0.18), which implies that this cluster is poorly constituted.. Average silhouette width of Cluster 3 is small (+0.15) , which implies arbitrariness of the cluster. Average silhouette width of Cluster 4 is quite large (+0.75) , which implies that this cluster is well constructed. Silhouette coefficient of the entire structure =0.44, which implies that clustering structure is rather weak. |
|
Research Question |
: |
Classification of a sample of research units in India according to the pattern of time spent on R & D inside the unit, administration., teaching and consultancy. |
|
Methodology |
: |
Cluster Analysis using the algorithm CLARA. |
|
Dataset |
: |
ICSOPRU2.DAT |
$RUN CLUSFIND $FILES PRINT = CLARA.LST DICTIN = ICSOPRU2.DIC DATAIN = ICSOPRU2.DAT $SETUP INCLUDE V1=360 Cluster Analysis using CLARA BADDATA=MD1 - IDVAR=V2 - VARS=(V22,V24,V25,V26) - ANALYSIS=CLARA - CMIN=3 - PRINT=(DICT,GRAPH,TRACE) -------------------------
Note: All options set at default values
|
Number of clusters: 3 Number of variables: 4 Number of objects: 100 Number of representative objects: 3 |
||
|
|
Drawing 5 samples of 46 objects.
Sample number 1
Objects selected:
108 117 126 136 204 208 209 210 250 303 309 310 413 419
425 430 501 504 510 604 608 704 711 715 750 805 809 810
813 904 908 909 910 916 1203 1206 1207 1214 1219 1224 1227 1301
1302 1305 1311 1401
Average distance, initial build = 7.519
Average distance for this sample = 7.515
Results for the entire data set
Total distance = 830.364
Average distance = 8.304
Cluster Size Medoid Coordinates of Medoids
1 44 17 45.00 5.29 4.00 6.71
2 35 70 64.00 6.67 1.67 2.67
3 21 21 78.00 6.00 .00 2.00
Average distance to each medoid
1 2 3
9.865 6.268 8.424
Maximum distance to each medoid
1 2 3
28.034 11.356 19.315
1
Maximum distance to a medoid divided by minimum distance to another medoid
1 2 3
1.429 .804 1.367
Sample number 2
Objects selected:
101 108 117 126 136 140 204 209 210 250 303 306 309 425
430 501 502 503 603 604 608 609 701 704 708 712 750 801
803 805 808 809 810 813 904 910 1203 1207 1208 1219 1224 1225
1227 1301 1307 1311
Average distance, initial build = 6.939
Average distance for this sample = 6.939
Results for the entire data set
Total distance = 826.122
Average distance = 8.261
Cluster Size Medoid Coordinates of Medoids
1 45 33 46.67 8.33 .00 6.67
2 34 70 64.00 6.67 1.67 2.67
3 21 85 77.50 .00 .83 4.17
Average distance to each medoid
1 2 3
9.327 6.138 9.413
Maximum distance to each medoid
1 2 3
29.104 9.775 19.226
Maximum distance to a medoid divided by minimum distance to another medoid
1 2 3
1.622 .645 1.269
Sample number 3
Objects selected:
108 117 120 136 201 205 210 250 303 304 307 309 310 320
501 502 503 504 505 506 601 604 606 610 702 703 706 711
715 750 803 809 813 904 907 908 909 916 1206 1208 1219 1224
1227 1301 1305 1314
Average distance, initial build = 8.357
Average distance for this sample = 7.916
Results for the entire data set
Total distance = 820.751
Average distance = 8.208
Cluster Size Medoid Coordinates of Medoids
1 44 33 46.67 8.33 .00 6.67
2 37 83 62.86 6.43 3.57 4.29
3 19 14 83.14 2.14 .86 2.00
Average distance to each medoid
1 2 3
9.306 6.427 9.132
Maximum distance to each medoid
1 2 3
29.104 10.805 16.742
Maximum distance to a medoid divided by minimum distance to another medoid
1 2 3
1.727 .641 .796
Sample number 4
Objects selected:
101 109 204 210 250 303 306 307 309 320 351 413 430 502
503 504 505 508 510 603 604 605 607 701 703 706 801 803
805 907 908 909 911 913 916 1203 1206 1207 1219 1224 1225 1308
1311 1312 1314 1401
Average distance, initial build = 7.831
Average distance for this sample = 7.714
Results for the entire data set
Total distance = 867.108
Average distance = 8.671
Cluster Size Medoid Coordinates of Medoids
1 52 20 49.17 6.67 1.17 8.33
2 41 84 67.50 3.33 .83 2.50
3 7 98 92.00 2.00 .00 .00
Average distance to each medoid
1 2 3
9.882 7.621 5.820
Maximum distance to each medoid
1 2 3
30.031 15.065 11.051
Maximum distance to a medoid divided by minimum distance to another medoid
1 2 3
1.538 .772 .448
Sample number 5
Objects selected:
108 109 120 140 201 208 210 250 301 304 320 351 401 425
501 502 504 506 508 601 602 603 604 607 701 702 704 711
712 750 808 810 904 909 916 1206 1214 1219 1225 1301 1302 1305
1311 1312 1314 1318
Average distance, initial build = 8.613
Average distance for this sample = 8.613
Results for the entire data set
Total distance = 834.124
Average distance = 8.341
Cluster Size Medoid Coordinates of Medoids
1 25 23 55.33 5.17 2.33 6.00
2 33 5 45.00 9.00 .00 5.00
3 42 44 70.00 6.14 1.14 5.00
Average distance to each medoid
1 2 3
6.785 8.858 8.862
Maximum distance to each medoid
1 2 3
12.492 29.051 27.442
Maximum distance to a medoid divided by minimum distance to another medoid
1 2 3
1.105 2.570 1.856
Final results
Sample number 3 was selected, with objects:
108 117 120 136 201 205 210 250 303 304 307 309 310 320
501 502 503 504 505 506 601 604 606 610 702 703 706 711
715 750 803 809 813 904 907 908 909 916 1206 1208 1219 1224
1227 1301 1305 1314
Average distance for the entire dataset: 8.208
Clustering vector
1 1 2 1 1 2 3 1 1 2 1 2 1 3 1 1 1 1 2 1 3 1 2 3 1 2 2 1 1 1 1 1 1
2 1 3 1 1 1 1 1 2 2 2 2 2 2 1 2 1 2 2 1 2 1 1 1 2 2 1 1 2 3 1 3 3
1 2 1 2 1 2 2 2 1 1 3 2 1 2 2 1 2 2 3 3 3 2 3 3 2 3 2 3 2 1 3 3 3
2
Cluster Size Medoid Objects
1 44 502
101 108 117 120 140 201 205 209 250 301 303
304 307 310 351 419 425 427 430 501 502 504
506 507 508 510 601 608 610 703 706 708 710
715 750 803 808 810 814 908 909 913 1204 1311
2 37 1206
109 126 204 208 306 320 401 413 503 602 603
604 605 606 607 609 701 702 704 711 712 801
809 813 902 904 907 911 916 1203 1206 1207 1224
1301 1305 1308 1401
3 19 210
136 210 309 322 505 802 805 806 910 1208 1214
1219 1225 1227 1302 1307 1312 1314 1318
Average distance to each medoid
1 2 3
9.306 6.427 9.132
Maximum distance to each medoid
1 2 3
29.104 10.805 16.742
Maximum distance to a medoid divided by minimum distance to another medoid
1 2 3
1.727 .641 .796
Silhouettes for the selected sample
0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1
CLU NEIG S(I) I +----+----+----+----+----+----+----+----+----+----+
1 2 .54 304|************************** |
1 2 .52 120|************************** |
1 2 .51 310|************************* |
1 2 .50 750|************************* |
1 2 .49 501|************************ |
1 2 .49 502|************************ |
1 2 .48 303|************************ |
1 2 .45 205|********************** |
1 2 .45 908|********************** |
1 2 .44 108|********************** |
1 2 .44 706|********************* |
1 2 .43 601|********************* |
1 2 .40 117|******************** |
1 2 .40 201|******************** |
1 2 .39 250|******************* |
1 2 .38 307|****************** |
1 2 .35 504|***************** |
1 2 .33 506|**************** |
1 2 .28 909|************** |
1 2 .13 715|****** |
1 2 .02 703|* |
1 2 -.04 610| |
1 2 -.09 803| |
| |
2 3 .71 813|*********************************** |
2 1 .70 809|*********************************** |
2 1 .70 1206|*********************************** |
2 3 .68 904|********************************** |
2 3 .64 916|******************************* |
2 1 .57 907|**************************** |
2 1 .56 606|**************************** |
2 3 .55 711|*************************** |
2 3 .52 503|************************** |
2 3 .51 1224|************************* |
2 3 .47 604|*********************** |
2 1 .43 702|********************* |
2 1 .26 1301|************ |
2 1 .22 320|*********** |
2 3 .22 1305|********** |
| |
3 2 .61 1314|****************************** |
3 2 .60 1227|****************************** |
3 2 .55 505|*************************** |
3 2 .55 210|*************************** |
3 2 .47 136|*********************** |
3 2 .26 1208|************* |
3 2 .26 309|************* |
3 2 -.15 1219| |
+----+----+----+----+----+----+----+----+----+----+
0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1
Cluster 1 has average silhouette width .36
Cluster 2 has average silhouette width .52
Cluster 3 has average silhouette width .39
For the selected sample, the average silhouette width is .42
which indicates a strong structure was found.
|
|
IDAMS reports analysis specifications: No. of object : 100 No. of Variables : 4 No. of Cluster : 3 No. of medoids : 3 (i.e. equal to the number of clusters) |
||
|
|
The algorithm draws 5 random samples of 40 + 3k objects, where k = the number of samples (= 46 objects) |
|
|
(a) Sample 1: Results for sample 1: List of objects in the sample. The average distance from BUILD (initial average distance) = 7.519. The average distance from SWAP (i.e. final average distance) = 7.515 These values are the average distances between each object of the sample and its most similar representative object. Results for the entire data set Total distance = 830.364 Average distance = 8.304 Following information for each cluster:
Similar information for each of the remaining four samples. Average distance for the entire data set by BUILD.
Sample 3 has the lowest average distance. None of these cluster is compact. This observation is also vindicated by the silhouette plot and the value of the silhouette coefficient (i.e. average silhouette width, which is only 0.42). |
||
|
Final Results The list of objects in the selected sample. Average distance for the entire data set = 8.208. Clustering Vector is interpreted as follows: First two objects belong to Cluster 1. The third object belongs to Cluster 2. The fourth and fifth research objects belong to Cluster 3. The sixth object belongs to Cluster 2, and so on.. |
||
|
Clustering characteristics of the final partition of the data set. For each cluster, the following information is pointed:
|
||
|
Silhouette plot Cluster 1 The silhouette value of objects 703, 610 and 803 are close to zero, implying that these objects are arbitrarily assigned to cluster 1. Cluster 2 None of the objects have values close to zero. Cluster 3 Object (1219) has negative silhouette value = -.1219, which is close to zero. This implies that this object is arbitrarily assigned to its cluster. Average silhouette width of cluster 2 is larger that that of the other two clusters. Average silhouette width = 0.42 which implies that clustering structure is weak and could possibly be artificial. |
|
Research Question |
: |
Classification of eleven countries according to their publication pattern in different sub fields of chemistry |
|
Methodology |
: |
Cluster Analysis using the algorithm Agnes (Agglomerative Nesting) |
|
Dataset |
: |
CHEM.DAT |
$RUN CLUSFIND $FILES PRINT = AGNES.LST DICTIN = CHEM.DIC DATAIN = CHEM.DAT $SETUP LUSFIND PROGRAM AGNES BADDATA=MD1 - STANDARDIZE - IDVAR=V1 - VARS=(V2-V10) - ANALYSIS=AGNES- PRINT=(DICT,DISSIM, GRAPH, TRACE,VNAM)
|
After filtering 11 cases read from the input data file Number of variables: 9 Number of objects: 11 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
*** Dissimilarity matrix ***
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Final ordering of objects and dissimilarities between them
1 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Dissimilarity banner
Dissimilarity banner
0 .08 .16 .24 .32 .40 .48 .56 .64 .72 .80 .88 .96 1
+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+--+--
1- 1- 1- 1- 1- 1- 1- 1- 1- 1- 1- 1- 1- 1- 1- 1- 1-
********************************************************************
8- 8- 8- 8- 8- 8- 8- 8- 8- 8- 8- 8- 8- 8- 8- 8- 8-
********************************************************
10- 10- 10- 10- 10- 10- 10- 10- 10- 10- 10- 10- 10- 10-
*******************************************
6- 6- 6- 6- 6- 6- 6- 6- 6- 6- 6
*****************************************
4- 4- 4- 4- 4- 4- 4- 4- 4- 4- 4- 4- 4- 4- 4
***********************************************************
9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9
******************************************************
5- 5- 5- 5- 5- 5- 5- 5- 5- 5- 5- 5- 5-
**********************************
2- 2- 2- 2- 2- 2- 2- 2-
****************************
7- 7- 7- 7- 7- 7- 7-
************
11- 11- 11-
***
3
+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+--+--
0 .08 .16 .24 .32 .40 .48 .56 .64 .72 .80 .88 .96 1
The actual highest level is 7.772362
The agglomerative coefficient of this data set is .54
|
|
IDAMS reports analysis specifications. No. of Variables = 9 No. of Objects = 11 |
||
|
|
Dissimilarity matrix This is the matrix of normalized Euclidean distances among the objects. It can be easily seen that dissimilarity between objects 1 and 8 Dissimilarity (1, 8) = 1.08 is the lowest Dissimilarity (11,3) = 10.06 is the highest. |
|
|
Final ordering of objects and dissimilarity First row shows the object identifiers, while the second row shows the dissimilarity between objects. Row I: 1 8 10 6 4 9 5 2 7 11 3 Row II: 1.082 2.373 3.650 3.905 1.983 2.542 4.604 5.247 6.854 7.772 The numbers in the first row indicate the order for listing the objects. In the second row, we find that the smallest value is 1.082, which is directly under (1) and (8). This means that (1) and (8) will be joined first at level 1.082. The second smallest value is 1.983 under (4) and (9), which means that (4) and (9) will be joined next. The third smallest value is 2.373, which is directly under (8) and (10). Recollect that (8) had already joined with (1). This means merger of (10) with (1, 8). The fourth smallest value is 2.542, which is directly under (9) and (5). Recall that (9) had already merged with (4). This means merger of (5) with (9, 4). The fifth smallest value is 3.650 which is directly under (10) and (5). Recall that (10) had already merged with (8) and (5), which had already merged with (9). This means merger of (10.5) with (5.9) and (10, 8). The sixth smallest value is 3.985 which is directly under (10) and (6). Recall that (10) had already merged with (8) and (5). This means merger of ( 6 ) with [ (10, 8) and (10, 5) ] , and so on. The highest value is 7.72, which is directly under (11, 3).. This means that objects (11, 3) merges in the last step. The entire hierarchy of objects can be described by two sequences of length 11 and 10 i.e. n and (n-1), where n is the number of objects. |
||
|
Dissimilarity Banner The banner shows successive merges from left to right. The objects are listed from top to bottom. The banner consists of stars and stripes. The stars indicate linking of objects and stripes are repetitions of the labels of the objects. There are fixed scales above and below the banner, going from 0 to 1 with steps of size .80. Here, 0 indicates a dissimilarity of 0 and 1 stands for the largest dissimilarity encountered, i.e.., the largest dissimilarity encountered at the last step of the merger. Note that the largest dissimilarity value is 7.774. The approximate level for a merger can be easily estimated from the banner plot. For example, object (10) merges with (8) at about 0.28, which is approximately equal to 0.31´ 7.772 = 2.4. The white space in the left part of the banner indicates the original stage where all objects are separate entities. Then at level » .16, we observe the beginning of the strip comprising 1-1-1-….(object 1) and later on 8-8-8-….(Object 8), and so on. The overall width of the banner is very important, because it gives an idea of the amount of structure that has been found by the clustering algorithm. Agglomerative coefficient computed by the program = 0.54. This coefficient is equal to the sum of the lengths of the lines.,divided by the number of objects (Note that each object is represented by one line). Agglomerative coefficient is simply the average width of the banner (or a fraction of the blackness of the banner). Agglomerative coefficient, AC = 0.54, which indicates that a reasonable configuration has been found |
|
Research Question |
: |
Classification of eleven countries according to their publication pattern in different sub fields of chemistry |
|
Methodology |
: |
Cluster Analysis using the algorithm Diana (Divisive Analysis) |
|
Dataset |
: |
CHEM.DAT |
$RUN CLUSFIND $FILES PRINT = DIANA.LST DICTIN = CHEM.DIC DATAIN = CHEM.DAT $SETUP CLUSTERING WITH PROGRAM DIANA BADDATA=MD1 - STANDARDIZE - IDVAR=V1 - VARS=(V2-V10) - ANALYSIS=DIANA- PRINT=(DICT,DISSIM, GRAPH, TRACE,VNAM) ---------
Note: All options set at default values.
|
After filtering 11 cases read from the input data file Number of variables: 9 Number of objects: 11 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
*** Dissimilarity matrix ***
At the first step the 11 objects are divided into groups of 10 and 1 |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Final ordering of objects and diameters of the clusters Objects 1 8 10 6 2 4 Diameters 1.082 2.560 3.904 4.546 6.352 Objects 9 5 7 11 3 Diameters 2.963 4.679 8.014 10.060 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Dissimilarity banner 1 .92 .84 .76 .68 .60 .52 .44 .36 .28 .20 .12 .04 0 --+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+--+ 1- 1- 1- 1- 1- 1- 1- 1- 1- 1- 1- 1- 1- 1- 1- 1- 1- ********************************************************************* 8- 8- 8- 8- 8- 8- 8- 8- 8- 8- 8- 8- 8- 8- 8- 8- 8- ********************************************************** 10- 10- 10- 10- 10- 10- 10- 10- 10- 10- 10- 10- 10- 10- 1 ************************************************ 6- 6- 6- 6- 6- 6- 6- 6- 6- 6- 6- 6- ******************************************** 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- ****************************** 4- 4- 4- 4- 4- 4- 4- 4- 4- 4- 4- 4- 4- 4- 4- 4 *************************************************************** 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9 ******************************************************* 5- 5- 5- 5- 5- 5- 5- 5- 5- 5- 5- 5- 5- 5 ******************************************* 7- 7- 7- 7- 7- 7- 7- 7- 7- 7- 7 ****************** 11- 11- 11- 11- 1 *** 3 --+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+--+ 1 .92 .84 .76 .68 .60 .52 .44 .36 .28 .20 .12 .04 0 The actual diameter of this data set is 10.060 The divisive coefficient of this data set is .61 |
|
IDAMS reports analysis specifications.
|
||
|
|
Dissimilarity matrix
|
|
|
Final ordering of objects and diameters of the clusters Objects 1 8 10 6 2 4
Diameters 1.082 2.560 3.904 4.546 6.352
Objects 9 5 7 11 3
Diameters 2.963 4.679 8.014 10.060
The largest diameter is 10.060, which stands between (11) and (3). This means that the whole data would be split at the level 10.060, yielding, a singleton cluster with the object (3) on the left and a cluster with objects (1, 8, 10, 6, 2, 4, 9, 5, 7, 11) on the right. Thus in the first step, we get two clusters:
The second largest diameter is 8.014, which stands between (7) and (11). This means that Cluster (1, 8, 10, 6, 2, 4, 9, 5, 7, 11) would be split into two clusters:
The third largest diameter is 6.352, which stands between (2) and (4), indicating that Cluster (1, 8, 10, 6, 2, 4, 9, 5, 7, 11) would be divided into two clusters:
The fourth largest diameter is 4.679 which stands between (5) and (7). This mean that Cluster (4, 9, 5, 7) would be divided into two clusters:
The fifth largest diameter is 4.546 which stands between (6) and (2). This means that Cluster (1, 8, 10, 6, 2) would be divided into two clusters:
The sixth largest diameter is 3.904, which stands between (10) and (6). This means that Cluster (1, 8, 10, 6) should be divided into two clusters:
This divisive process is continued till we get 11 singleton clusters. |
||
|
Dissimilarity Banner The dissimilarity banner is similar to that of Agnes (Example 7.4), but it floats in the opposite direction. Also, the scales that surround the banner are plotted differently, since they decrease from 1 to 0. Here, 0 indicates a zero diameter and 1 stands for the diameter of the entire dataset, which is equal to 10.060. Diameter = 0 corresponds to singletons. The overall width of the banner reflects the strength of the clustering. When the diameter of the entire data set is much larger than that of the diameter of individual clusters, the banner is wide. The divisive coefficient DC is the average width of the banner. DC = 0.61 which indicates good clustering. |
|
Research Question |
: |
Classification of 33 major academic institutions in India according to priorities given to different scientific fields. |
|
Methodology |
: |
Cluster Analysis using the algorithm MONA (Monothetic Analysis) |
|
Dataset |
: |
MONA.DAT |
$RUN CLUSFIND $FILES PRINT = MONA.LST DICTIN = ACADEMIC.DIC DATAIN = ACADEMIC.DAT $SETUP CLUSTER ANALYSIS USING MONA BADDATA=MD1 - IDVAR=V1 - VARS=(V2-V9) - ANALYSIS=MONA - CMAX=5 - PRINT=(DICT,DISS,GRAPH,TRACE,VNAM) -------------------------------
Note; All options set at default values.
|
After filtering 33 cases read from the input data file Number of variables: 8 Number of objects: 33 |
||
|
|
Step number 1 Cluster 1 3 4 6 7 8 9 10 12 13 18 23 25 26 32 2 5 11
14 15 16 17 19 20 21 22 24 27 28 29 30 31 33
is divided into 15 and 18 objects, using variable LIF Step number 2 Cluster 1 3 4 7 8 10 12 13 18 23 26 32 6 9 25 is divided into 12 and 3 objects, using variable MED Cluster 2 14 15 16 19 20 21 27 28 29 30 31 33 5 11 17 22 24 is divided into 13 and 5 objects, using variable ESP Step number 3 Cluster 1 3 4 7 8 10 13 18 23 12 26 32 is divided into 9 and 3 objects, using variable MAT Cluster 6 9 25 is divided into 2 and 1 objects, using variable MAT Cluster 2 16 20 21 28 29 30 31 33 14 15 19 27 is divided into 9 and 4 objects, using variable MED Cluster 5 17 22 24 11 is divided into 4 and 1 objects, using variable PHY Step number 4 Cluster 1 3 4 7 8 10 13 18 23 is divided into 5 and 4 objects, using variable ESP Cluster 12 32 26 is divided into 2 and 1 objects, using variable PHY Cluster 6 9 Cannot be separated by the remaining variables. Cluster 2 16 20 29 21 28 30 31 33 is divided into 4 and 5 objects, using variable PHY Cluster 14 15 27 19 is divided into 3 and 1 objects, using variable PHY Cluster 5 22 24 17 is divided into 3 and 1 objects, using variable MAT Step number 5 Cluster 1 3 7 8 4 is divided into 4 and 1 objects, using variable CHE Cluster 10 23 13 18 is divided into 2 and 2 objects, using variable ENG Cluster 12 32 Cannot be separated by the remaining variables. Cluster 2 29 16 20 is divided into 2 and 2 objects, using variable CHE Cluster 21 28 30 31 33 is divided into 1 and 4 objects, using variable CHE Cluster 14 27 15 is divided into 2 and 1 objects, using variable CHE Cluster 5 22 24 is divided into 2 and 1 objects, using variable CHE Step number 6 Cluster 1 3 7 8 Cannot be separated by the remaining variables. Cluster 10 23 Cannot be separated by the remaining variables. Cluster 13 18 Cannot be separated by the remaining variables. Cluster 2 29 is divided into 1 and 1 objects, using variable ENG Cluster 16 20 Cannot be separated by the remaining variables. 1 Cluster 28 30 33 31 is divided into 3 and 1 objects, using variable MAT Cluster 14 27 is divided into 1 and 1 objects, using variable AGR Cluster 5 22 is divided into 1 and 1 objects, using variable MED Step number 7 Cluster 28 30 33 is divided into 1 and 2 objects, using variable AGR Step number 8 Cluster 30 33 Cannot be separated by the remaining variables. |
|
|
Final Ordering of Objects 1 3 7 8 4 10 23 13 18 12 32 26 6
Step 5 4 5 3 4 2
By CHE ESP ENG MAT PHY MED
6 9 25 2 29 16 20 21 28 30 33 31 14
Step 3 1 6 5 4 5 7 6 3
By MAT LIF ENG CHE PHY CHE AGR MAT MED
14 27 15 19 5 22 24 17 11
Step 6 5 4 2 6 5 4 3
By AGR CHE PHY ESP MED CHE MAT PHY
|
||
|
Separation Plot 0 1 2 3 4 5 6 7
1- 1- 1- 1- 1- 1- 1- 1- 1- 1- 1- 1- 1-
3- 3- 3- 3- 3- 3- 3- 3- 3- 3- 3- 3- 3-
7- 7- 7- 7- 7- 7- 7- 7- 7- 7- 7- 7- 7-
8- 8- 8- 8- 8- 8- 8- 8- 8- 8- 8- 8- 8-
CHE ****************************************************
4- 4- 4- 4- 4- 4- 4- 4- 4- 4- 4- 4- 4-
ESP ******************************************
10- 10- 10- 10- 10- 10- 10- 10- 10- 10- 10- 10- 10-
23- 23- 23- 23- 23- 23- 23- 23- 23- 23- 23- 23- 23-
ENG ****************************************************
13- 13- 13- 13- 13- 13- 13- 13- 13- 13- 13- 13- 13-
18- 18- 18- 18- 18- 18- 18- 18- 18- 18- 18- 18- 18-
MAT ********************************
12- 12- 12- 12- 12- 12- 12- 12- 12- 12- 1
32- 32- 32- 32- 32- 32- 32- 32- 32- 32- 3
PHY ******************************************
26- 26- 26- 26- 26- 26- 26- 26- 26- 26- 2
MED **********************
6- 6- 6- 6- 6- 6- 6- 6-
9- 9- 9- 9- 9- 9- 9- 9-
MAT ********************************
25- 25- 25- 25- 25- 25- 25- 25-
LIF ************
2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2-
ENG **************************************************************
29- 29- 29- 29- 29- 29- 29- 29- 29- 29- 29- 29- 29- 29- 29- 2
CHE ****************************************************
16- 16- 16- 16- 16- 16- 16- 16- 16- 16- 16- 16- 16-
20- 20- 20- 20- 20- 20- 20- 20- 20- 20- 20- 20- 20-
PHY ******************************************
21- 21- 21- 21- 21- 21- 21- 21- 21- 21- 21- 21- 21-
CHE ****************************************************
28- 28- 28- 28- 28- 28- 28- 28- 28- 28- 28- 28- 28- 28- 28- 28- 28- 28-
AGR ************************************************************************
30- 30- 30- 30- 30- 30- 30- 30- 30- 30- 30- 30- 30- 30- 30- 30- 30- 30-
33- 33- 33- 33- 33- 33- 33- 33- 33- 33- 33- 33- 33- 33- 33- 33- 33- 33-
MAT **************************************************************
31- 31- 31- 31- 31- 31- 31- 31- 31- 31- 31- 31- 31- 31- 31- 3
MED ********************************
14- 14- 14- 14- 14- 14- 14- 14- 14- 14- 14- 14- 14- 14- 14- 1
AGR **************************************************************
27- 27- 27- 27- 27- 27- 27- 27- 27- 27- 27- 27- 27- 27- 27- 2
CHE ****************************************************
15- 15- 15- 15- 15- 15- 15- 15- 15- 15- 15- 15- 15-
PHY ******************************************
19- 19- 19- 19- 19- 19- 19- 19- 19- 19- 1
ESP **********************
5- 5- 5- 5- 5- 5- 5- 5- 5- 5- 5- 5- 5- 5- 5-
MED **************************************************************
22- 22- 22- 22- 22- 22- 22- 22- 22- 22- 22- 22- 22- 22- 22- 2
CHE ****************************************************
24- 24- 24- 24- 24- 24- 24- 24- 24- 24- 24- 24- 24-
MAT ******************************************
17- 17- 17- 17- 17- 17- 17- 17- 17- 17- 1
PHY ********************************
11- 11- 11- 11- 11- 11- 11- 11-
0 1 2 3 4 5 6 7
|
|
IDAMS reports analysis specifications
|
||
|
|
The whole sample is successively divided into clusters in eight steps. The results are represented in Figure 1 in the form of a hierarchical tree. Note that in monothetic analysis, only one variable is taken at a time for hierarchical clustering of objects. Hence, the name Monothetic clustering. In the tree, the variable used for a split is indicated in the figure. |
|
|
Final Ordering of Objects The first row shows the sequence of objects and the second row shows the separation steps. For example, the first step appears between (2) and (25). All the objects starting from (25) are separated at the first step from the other objects. This separation is carried out, using the variable LIFE. The second step appears between (26) and (6). The objects starting from (6) to (25) are separated at step 2. This separation is carried out using the variable MED, and so on. |
||
|
Separation Banner Each object of the data set corresponds to a horizontal line in the banner. The horizontal lines are ordered in the same way as the first row of the final ordering of objects. The end of a row of stars **** indicates a separation between clusters. If two are more lines representing objects are stuck together, it means that the objects cannot be separated. Objects (1, 3, 7, 8) are stuck together and hence cannot be further split. The length of the row of stars is proportional to the step number at which the separation is carried out. These objects were separated from the other object (4) at step 5, using the variable CHE. When the row of an object does not continue to the right hand side of the banner, it means that at the corresponding step it becomes a singleton cluster. For example, object (11) becomes a singleton at step 3, using the variable PHY. It is important to note that the banner of MONA cannot be used to assess the quality of clustering, because the length of the row of stars is proportioned to the number of the separation step, and not to the tightness of the clusters |
|
Research Question |
: |
Construct a typology of academic scientists according to the pattern of their involvement in different activities, and identify their main characteristics. |
|
Methodology |
: |
Classification ,f using the IDAMS module Typol. |
|
Dataset |
: |
TYPE.DAT |
$RUN TYPOL $FILES PRINT = typology.lst DICTIN = anju.dic DATAIN = anju.dat $SETUP EXCLUDE V1=1220,2049,2055,2074,2075,5016 Activity profile of academic scientists BADDATA=MD1 - AQNTVARS=(V2-V8) - PQNTVARS=(V13)- PQLTVARS=(V9,v10,v14, v15) - INITIAL=RANDOM NCASES=1073 - DTYPE=EUCLID - INIGROUP=5 - FINGROUP=3- PRINT=(CDICT,GRAP,ROWP,DIST)
--------------------------------------------
Note: All options set at default values
|
Number of initial groups: 5 Initial configuration is a random sample No standardization of active variables Type of distance is 'Euclidean' Regrouping is based on minimum displacement Print the graphic of profiles Print row % for qual. variables categories Print table of distances and displacements for each regrouping Active quantitative variables Passive quantitative variables After filtering 1067 cases read from the input file The distances and displacements are computed on non-standardized variables |
||
|
|
% of explained variance from one iteration to another Iteration number Mean EV Image
1 .345 ***
2 . 358 ****
3 . 359 ****
|
|
|
Characteristics of distances by groups Group no. N Mean SD
1 217. 58.844 81.419
2 94. 73.033 71.358
3 204. 39.833 39.680
4 174. 68.739 81.807
5 354. 41.761 43.044
1 Total count Mean SD
1043. 52.257 63.657
|
||
Var seq Name Mean S. D. Weight
1 v262:teaching 42.02 18.00 1.00
2 v263:research 22.40 12.28 1.00
3 v264:supervsn 16.14 12.01 1.00
4 v265:lab-dev 5.76 7.04 1.00
5 v266:admin 7.78 9.30 1.00
6 v267:extension 2.94 6.30 1.00
7 v268:profess 2.96 4.23 1.00
8 v344:#doc students 2.00 1.96 .00
|
||
|
Description of resulting typology Group number 1 2 3 4 5
Total cases 1000 Proportion of cases 208 90 195 166 339
Explained Grand
variance mean
1 767 ******** 42.02 v262:teaching 25.63 23.28 68.32 31.83 46.88
8.12 9.55 9.63 10.90 6.80
2 464 ***** 22.40 v263:research 22.09 16.30 17.19 40.63 18.26
8.76 7.06 9.14 12.57 7.22
3 519 ***** 16.14 v264:supervsn 30.88 14.22 5.09 10.48 16.77
11.14 6.85 5.26 8.20 8.17
4 83 * 5.76 v265:lab-dev 5.50 10.96 2.99 5.25 6.38
5.76 12.12 4.11 5.97 6.88
5 564 ****** 7.78 v266:admin 6.50 29.74 3.74 5.45 6.21
4.94 11.70 5.12 5.21 5.59
6 23 2.94 v267:extension 4.50 2.12 1.87 3.51 2.54
8.28 4.00 5.13 7.32 5.16
7 96 * 2.96 v268:profess 4.91 3.38 .80 2.86 2.94
4.24 4.22 2.08 4.09 4.60
8 164 ** 2.00 v344:#doc students 3.24 2.04 .75 1.79 2.05
1.98 1.69 1.36 2.06 1.77
9 70 * 34.2 v204:rank CODE=0001 48.4 54.3 14.2 33.9 31.9
100.0 29.4 14.3 8.1 16.4 31.6
10 3 34.5 v204:rank CODE=0002 36.4 27.7 33.3 32.8 36.7
100.0 21.9 7.2 18.8 15.8 36.1
11 90 * 29.5 v204:rank CODE=0003 12.0 14.9 52.5 32.2 29.7
100.0 8.4 4.5 34.6 18.1 34.1
12 75 * 10.3 v217:head? CODE=0001 8.8 36.2 5.9 5.2 9.3
100.0 17.8 31.7 11.2 8.4 30.8
13 76 * 88.4 v217:head? CODE=0002 90.8 60.6 92.6 93.1 89.5
100.0 21.4 6.2 20.4 17.5 34.3
14 63 * 35.1 sv:inst type CODE=0001 21.7 22.3 53.4 25.9 40.7
100.0 12.8 5.7 29.7 12.2 39.3
15 59 * 28.3 sv:inst type CODE=0002 32.7 47.9 8.3 31.6 30.2
100. 24.1 15.2 5.7 18.6 36.2
16 110 * 5.7 sv:inst type CODE=0003 19.4 1.1 .0 8.6 .3
100.0 71.2 1.7 .0 25.3 1.7
17 9 31.0 sv:inst type CODE=0004 26.3 28.7 38.2 33.9 28.8
100.0 17.6 8.3 24.1 18.2 31.5
18 1 59.3 :field CODE=0001 61.8 55.3 57.4 59.2 59.9
100.0 21.7 8.4 18.9 16.6 34.3
19 1 39.3 :field CODE=0002 37.8 42.6 40.2 39.7 38.7
100.0 20.0 9.7 19.9 16.7 33.4
|
||
|
Variables explaining 80% of the variance Var seq Names Expl. Var
1 v262:teaching 767
5 v266:admin 564
3 v264:supervsn 519
2 v263:research 464
Expl. Var = amount of variance explained by one variable |
||
|
Displacements Square roots of (computed on weighted variables) Groups
Numbers 1 2 3 4
2 37.7
3.362
3 62.0 49.9
4.368 4.496
4 44.6 40.0 54.6
3.274 3.694 4.072
5 50.3 42.7 48.3 48.4
3.132 3.574 3.068 3.238
Regrouping number 1
Group 2 is incorporated into group 1
Displacement = 1421.829
Distance = 11.305
|
||
Group number 1 3 4 5
Total cases 1000 Proportion of cases 298 195 166 339
Explained Grand
variance mean
1 765 ******** 42.02 v262:teaching 24.92 68.32 31.83 46.88
8.64 9.63 10.90 6.80
9 69 * 34.2 v204:rank CODE=0001 50.2 14.2 33.9 31.9
100.0 43.7 8.1 16.4 31.6
10 1 34.5 v204:rank CODE=0002 33.8 33.3 32.8 36.7
100.0 29.1 18.8 15.8 36.1
11 90 * 29.5 v204:rank CODE=0003 12.9 52.5 32.2 29.7
100.0 13.0 34.6 18.1 34.1
|
||
|
Group 1 Var EV Mean -2.5 -2.0 -1.5 -1.0 -0.5 0 0.5 1.0 1.5 2.0 2.5
seq I
1 767 25.631 X-----------------I
v262:teaching I
2 464 22.092 XI
v263:research I
3 519 30.876 I------------------------X
v264:supervsn I
4 83 5.498 XI
v265:lab-dev I
5 564 6.498 X--I
v266:admin I
6 23 4.498 I----X
v267:extension I
7 96 4.908 I--------X
v268:profess I
8 164 3.240 I------------X
v344:#doc students I
9 70 48.387 I-----X
v204:rank CODE=0001 I
10 3 36.406 IX
v204:rank CODE=0002 I
11 90 11.982 X-------I
v204:rank CODE=0003 I
12 75 8.756 XI
v217:head? CODE=0001 I
13 76 90.783 IX
v217:head? CODE=0002 I
14 63 21.659 X-----I
sv:inst type CODE=0001 I
15 59 32.719 I-X
sv:inst type CODE=0002 I
16 110 19.355 I-----------X
sv:inst type CODE=0003 I
17 9 26.267 X-I
sv:inst type CODE=0004 I
18 1 61.751 IX
:field CODE=0001 I
19 1 37.788 XI
:field CODE=0002 I
|
||
......... : : : : ..... : : : : T 1 T 2 T 4 T 0 208 90 166 |
|
IDAMS reports analysis specification No. of cases read = 1067 No. of cases analysed = 1063 Active quantitative variables = V2-V8 Passive quantitative variables = V13 Passive qualitative variables = V9, V10, V14, V15 (These variables are defined in the file TYPE.DIC) No. of initial groups = 5 No. of final groups = 3 Distance metric used = Euclidean Regrouping based on minimum distance. |
||
|
|
History of % of variation explained from the first iteration to the final (i.e. the third iteration) |
|
|
Characteristics of distances by groups N = The number of cases of each group of the initial typology Total count = Total number of cases participating in the building of
the initial typology |
||
|
Mean, S.D. and Weight of quantitative variables |
||
|
Description of typology For each variable, the following information is given: For each typology group: Quantitative variables Qualitative variables
For each group: Row 1: Percentage frequency of a given category in the group. This
value summed over all categories = 100 For example: Quantitative Variable V262: Teaching Gr1 Gr2 Gr3 Gr4 Gr5 Row 1: 25.63 23.28 60.32 31.83 46.88 Row 2: 8.62 9.55 9.63 10.90 6.80 The first row shows the average time spent by the members of typology group. The second row shows the standard deviation of this variable for each typology group. For example: Qualitative Variable V262: Rank This variable has 3 categories: Col.1 Col.2 Col.3 Col.4 Col.5 Col.6 Col.7
Code Gr1 Gr2 Gr3 Gr4 Gr5
34.2 1 48.4 54.3 14.2 33.9 31.9
100.0 29.4 14.3 8.1 16.4 31.6
34.5 2 36.4 27.7 33.3 32.8 36.7
100.0 21.9 7.2 18.8 15.8 36.1
29.5 3 12.0 14.9 52.5 32.2 29.7
100.0 8.4 4.5 34.6 18.1 34.1
Col. 1 show the distribution of different categories in the entire sample: 34.2% Category 1 (Professor) Let us consider Group 1 Category 1: Professors : 50.6% Category 1 is more abundant in Group 1 Similar interpretation for other categories. |
||
|
Variables explaining 80% of the variance. This is a list of the most discriminant variables, which taken together account for 80% of the explained variance. These variables are ranked according to their explanatory power.
|
||
|
Matrix of (square roots of) Inter-group distances and displacements It can be easily seen that the distance between Group 1 and Group 2 is
minimum. |
||
|
Description of the resulting 3 - group typology |
||
|
Graphical representation of profiles of different groups Vertical line correspondence to the grand mean for all variables. Horizontal bars to the right of the vertical line Þ Value greater than the grand mean. The length of the bar is proportional to the deviation from the grand mean (calibrated in terms of standard deviation). Horizontal bars to the left of the vertical line Þ Value less than the grand mean. The length of the bar is proportional to the deviation from the grand mean (calibrated in terms of standard deviation). |
||
| Dendrogram showing the mergers of groups. The dendrogram can help in deciding the number of typology groupd retained for interpretation.. Another factor in deciding the number of typology groups is based on the interpretation of typology from theoretical point of view. |