|
Research Question |
: |
Research units spent time on different activities: We assume that activity patterns may differ between two countries. Can we derive a rule for classifying research units into two groups [Country Code: 360 and on the basis 630] of their activity patterns?
|
|
Methodology |
: |
Discriminant analysis, using IDAMS module DISCRAN |
|
Dataset |
: |
ICSOPRU2.DAT |
$RUN DISCRAN $FILES PRINT = DISCRAN.LST DICTIN = ICSOPRU2.DIC DATAIN = ICSOPRU2.DAT $SETUP INCLUDE V1=360, 638 prototype for DISCRAN program BADDATA=MD1 - VARS=(V22 -V30) - MDHANDLING=(SAMPVAR,GROUPVAR,ANALVAR) - IDVAR=V2 - STEP=10- PRINT=(CDICT,GROUP) - GRVAR=V1 GR01=360 GR02=638
|
List of variables V22 V23 V24 V25 V26 V27 V28 V29 V30 After filtering 464 cases read from the
input data file Revised number of cases in samples |
||
|
|
Means and standard deviations (In the last column the global mean is printed) Group 1 Group 2 Variable Mean S.D. Mean S.D. Mean 22 55.3972 17.0743 28.0533 12.0499 42.1377 23 5.2059 5.1370 6.8226 7.3723 5.9899 24 6.8472 4.4405 12.7442 6.4885 9.7067 25 2.1979 2.5754 13.4935 12.2004 7.6753 26 6.0667 5.1025 2.6900 2.7588 4.4293 27 2.8118 2.6504 2.7582 3.7750 2.7858 28 12.5308 11.1494 25.9432 17.8515 19.0347 29 5.7081 9.7895 2.8740 6.0661 4.3338 30 3.2399 6.4383 4.6189 8.9578 3.9086 |
|
|
Step number 1 Variables entered : 22
Variable number : At each step, an observation is allocated :
Original group % correctly classified : 82.76 Step number 2 Variables entered : 22 26
Variable
number :
Original group
% correctly classified : 87.72 Step number 3 Variables entered : 22 26 29
Variable number :
Original group
% correctly classified : 89.22 Step number 4 Variables entered : 22 26 29 25
Variable
number :
Original group
% correctly classified : 90.30 Step number 5 Variables entered : 22 26 29 25 24
Variable number : Constant 22 26
29 25 24
Original group
% correctly classified : 91.38 The percentage of the basic sample decreases at the next step |
||
|
Allocation and value of the linear discriminant function for the observation in the basic sample Group 1 Allocation Function 101 1 1.145 108 1 .023 109 1 1.726 201 2 -.856 204 1 1.333 419 2 -.035 425 1 .034 427 2 -.910 430 1 .846 501 2 -.101 502 1 .667 503 1 1.616 504 1 1.770 505 1 2.877 506 1 .857 507 2 -.230 ***** Group 2 Allocation Function 101 2 -.919 102 2 -.710 202 1 .665 301 2 -.195 302 1 1.599 303 2 -1.054 901 1 .263 902 1 1.254 1001 1 1.381 1601 1 .791 1602 1 2.289 1603 1 .233 ***** Allocation and value of the linear discriminant function for the group means Group 1 : 1.267 |
|
IDAMS reports analysis specifications. Basic sample: 464
Test sample : 0
Anonymous sample: 0
No. of groups: 2 |
||
|
|
IDAMS reports descriptive statistics (mean and standard deviation) of discriminant variable for each group and means of these variable for the entire sample. These statistics give on idea of inter-group differences in the time devoted to different activities. |
|
|
IDAMS reports the results of stepwise discriminant analysis. At each stage the most important discriminant variable is selected and a linear discriminant function (LDF) is set up. For each object, LDF score is computed and the object is allocated to: Group 1, if the LDF score is strictly positive Not allocated to any group, if LDF score = 0. Step 1:
Step 2:
Step 3:
Step 4:
Step 5:
|
||
|
For each object, the value of the discriminant function is computed and
the object is allocated to a group according to the allocation procedure
mention at Table of Allocation of objects belong to Group-1. To summarize the most important discriminant variables are: V22, V26, V29 and V25. and the resulting discriminant function LDF = -2.456+.055V22+.118V26+.039V29-.037V25-.030V24 achieves a classification accuracy of 91.38%. |
|
Research Question |
: |
Research units spent time on different activities: We assume that activity patterns may differ between two countries. Can we derive a rule for classifying research units into two groups [Country Code: 360 and on the basis 630] of their activity patterns? |
|
Methodology |
: |
Discriminant analysis, using IDAMS module DISCRAN |
|
Dataset |
: |
ANJU.DAT |
$RUN DISCRAN $FILES PRINT = SABINA1.LST DICTIN = ANJU.DIC DATAIN = ANJU.DAT $SETUP EXCLUDE V2=0 Pototype for DISCRAN program BADDATA=MD1 - VARS=(V2-V8)- MDHANDLING=(SAMPVAR,GROUPVAR,ANALVAR) - IDVAR=V1 - STEP=7 - PRINT=(CDICT,DATA,GROUP) - SAVAR=V9 BASA=(1,2) ANSA=(3) - GRVAR=V15 GR01=1 GR02=2
|
Maximum of steps to be performed: 7 After filtering 1063 cases read from the input data file Number of cases in samples Revised number of cases in samples |
||||||||
|
|
Basic sample
|
|||||||
|
Means and standard deviations ============================= (In the last column the global mean is printed) Group 1 Group 2
Variable Mean S.D. Mean S.D. Mean
2 34.2300 14.2926 36.0760 17.4503 35.0447
3 23.0700 12.7850 18.8481 10.0833 21.2067
4 21.7450 11.5174 17.0063 10.5701 19.6536
5 3.6850 4.6460 6.5380 6.7305 4.9441
6 10.1500 11.4021 11.8861 11.7365 10.9162
7 1.9900 5.0110 5.7658 7.9062 3.6564
8 5.1300 4.4736 3.8797 4.6462 4.5782
******************************************************************************
Step number 1 Variables entered : 7 Linear discriminant function Variable
number : At each step, an observation is allocated :
Original group
% correctly classified : 67.32 ****************************************************************************** Step number 2 Variables entered : 7 5 Linear discriminant function Variable
number :
Original group
% correctly classified : 68.72 ****************************************************************************** Step number 3 Variables entered : 7 5 4
Variable
number :
Original group
% correctly classified : 71.23 The percentage of the basic sample decreases at the next step
Group 1 Allocation Function 1002 1 .154 1015 2 -.049 1016 1 .024 1018 1 .689 3205 2 -.240 3231 2 -.495 3233 1 .243 3235 1 .407 3245 1 .390 Group 2 Allocation Function 1010 2 -.240 1032 2 -.381 1054 2 -.635 1115 2 -.663 1182 1 .120 1191 1 .548 1199 1 .407 3162 2 -.099 3163 2 -.752 3188 2 -.275 3195 1 .179 3199 2 -.099 Allocation and value of the linear discriminant function for the group means Group 1 : .290 *********************************************************** Allocation and value of the linear discriminant function for the anonymous observations (only a few cases re-listed here) Number Allocation Function 1006 2 -1.058 1014 1 .011 1020 1 .125 1021 1 .125 1024 1 .971 6007 1 .689 6008 1 .013 6009 2 -1.173 6010 2 -.381 9009 1 .295 |
|
IDAMS reports analysis specifications
Grouping variable: V15 |
||
|
|
IDAMS prints the values of discriminant variables for the basic and anonymous
samples perceptively. |
|
| Descriptive statistics of discriminant variables for each group of the anonymous sample and global mean | ||
|
Stepwise discriminant analysis Step 1 At this stage, the most discriminant variable is identified, which is
variable V7. The linear discriminant function computed with this variable. Classification table for basic sample
Original group
% correctly classified : 67.32 Step 2 At this stage, the algorithm identifies V5 as the most important discriminant variable after V7. Variables V7 and V5 are used to construct the linear discriminant function LDF = .723-.084V7-.084V5 Step 3 At this stage, variable V4 is added to the linear discriminant function LDF = -.125-.079V7-.079V5+.028V4 The algorithm stops at this stage, since no others variable is able to improve the classification accuracy of the linear discriminant function. |
||
|
Computation of LDF scores and classification of objects for each group: Group: 1 Consider for example object Idcode: 1015. It has LDF score = -.049. Since the score is negative, this object is assigned to Group 2. Similar, interpretation for objects of Group 2. |
||
|
Computation of LDF scores of objects of the anonymous sample and assignment of objects to the groups. Classification accuracy for the anonymous sample cannot be computed since their actual group membership is not known. |
|
Research Question |
: |
Drive a rule for classifying a set of 29 countries as Western, Asian and East European on the basis of priorities given in 10 fields: subfields of Physics. |
|
Methodology |
: |
Multiple Discriminant analysis, using IDAMS module DISCRAN |
|
Dataset |
: |
PHYSICS.DAT |
$RUN DISCRAN $FILES PRINT = PHYSICS.LST DICTIN = PHYSICS.DIC DATAIN = PHYSICS.DAT $SETUP EXCLUDE V12=4 AND V12=5 MULTIPLE DISCRIMINANT ANALYSIS OF PHYSICS DATA BADDATA=MD1 - VARS=(V2-V11) - MDHANDLING=(SAMPVAR,GROUPVAR,ANALVAR) - IDVAR=V1 - STEP=8 - PRINT=(CDICT,DATA,GROUP) - GRVAR=V12 GR01=1 GR02=2 GR03=3
|
After filtering 36 cases read from the input data file Number of cases in samples Revised number of cases in samples |
||
|
|
Table of Means
Variable GR01 GR02 GR03 TOT.
2 78.4706 52.5714 111.8000 77.9655
3 93.1765 169.8571 86.8000 110.5862
4 118.7059 90.8571 109.6000 110.4138
5 73.5294 85.2857 87.6000 78.7931
6 118.8235 109.5714 115.0000 115.9310
7 116.7059 64.1429 56.4000 93.6207
8 155.6471 71.7143 95.4000 125.0000
9 100.8824 78.2857 104.6000 96.0690
10 104.2353 69.8571 142.8000 102.5862
11 87.7647 35.7143 135.2000 83.3793
Table of Standard Deviations
Variable GR01 GR02 GR03
2 40.1865 37.7921 50.1733
3 21.5631 44.8344 34.9365
4 34.1015 35.2032 26.1121
5 20.9119 33.7965 59.9920
6 48.5280 47.3493 64.1685
7 57.2690 33.7107 34.0975
8 103.1184 32.7445 34.9548
9 41.2209 36.8112 53.2826
10 73.9400 32.7476 63.5591
11 45.7712 14.7039 53.7900
|
|
|
Step number 1 Variables entered : 3
Original group
% correctly classified : 65.52 |
||
|
Step number 1 Variables entered : 3 7
Original group
% correctly classified : 65.52 |
||
|
Step number 3 Variables entered : 3 7 11
Original group
% correctly classified : 72.41 |
||
|
Step number 4 Variables entered : 3 7 11 8
Original group
% correctly classified : 72.41 |
||
|
Step number 5 Variables entered : 3 7 11 8 4
Original group
% correctly classified : 79.31 The percentage of the basic sample decreases at the next step
Group 1 Allocation Distances to each group
GR01
1 3 2.356 ( 1) 7.120 ( 2) .520 ( 3)
3 1 1.599 ( 1) 1.734 ( 2) 3.177 ( 3)
5 3 1.883 ( 1) 4.397 ( 2) 1.529 ( 3)
6 1 1.134 ( 1) 1.829 ( 2) 2.607 ( 3)
7 1 1.596 ( 1) 5.196 ( 2) 2.107 ( 3)
9 1 3.963 ( 1) 10.770 ( 2) 10.126 ( 3)
10 1 9.893 ( 1) 16.696 ( 2) 17.657 ( 3)
11 3 1.572 ( 1) 3.733 ( 2) 1.422 ( 3)
16 3 1.591 ( 1) 3.323 ( 2) 1.135 ( 3)
17 1 2.954 ( 1) 6.511 ( 2) 6.572 ( 3)
18 1 4.003 ( 1) 5.912 ( 2) 10.414 ( 3)
22 1 1.958 ( 1) 3.844 ( 2) 5.674 ( 3)
23 3 9.470 ( 1) 12.955 ( 2) 7.041 ( 3)
24 1 1.040 ( 1) 3.641 ( 2) 1.059 ( 3)
28 1 5.561 ( 1) 10.721 ( 2) 7.229 ( 3)
30 1 9.491 ( 1) 15.857 ( 2) 10.312 ( 3)
35 1 15.993 ( 1) 23.778 ( 2) 20.558 ( 3)
Group 2 Allocation Distances to each group
GR02
2 2 8.176 ( 1) 5.961 ( 2) 8.253 ( 3)
12 2 3.207 ( 1) .279 ( 2) 4.337 ( 3)
15 2 16.687 ( 1) 6.353 ( 2) 16.651 ( 3)
21 2 4.564 ( 1) 1.059 ( 2) 4.217 ( 3)
25 2 3.994 ( 1) 2.398 ( 2) 6.420 ( 3)
26 2 5.229 ( 1) 3.755 ( 2) 6.529 ( 3)
29 2 5.882 ( 1) 2.421 ( 2) 7.177 ( 3)
Group 3 Allocation Distances to each group
GR03
4 3 10.175 ( 1) 11.813 ( 2) 7.909 ( 3)
8 3 4.062 ( 1) 3.880 ( 2) 3.175 ( 3)
14 3 4.049 ( 1) 8.732 ( 2) 1.331 ( 3)
19 1 1.216 ( 1) 3.622 ( 2) 2.337 ( 3)
33 3 10.878 ( 1) 15.002 ( 2) 5.898 ( 3)
Discriminant factor analysis at step 5 Sum of eigenvalues = .89884 Discriminant power of first factor : .64040 Discriminant power of second factor : .25843 Discriminant power of third factor : .00000 First eigenvector -.01455 .00635 .00644 .00251 -.00375 Second eigenvector -.00208 -.00980 .01130 -.00275 -.00943 Values of discriminant factors for all observations and group means Group GR01 Group GR02 Group GR03
The group near is indicated by * The overlapping of 2 observations from different groups by $ One observation of the group GR01 is presented by 1 One observation of the group GR02 is presented by 2 One observation of the group GR03 is presented by 3 |
1 -3.784 -3.314 -2.843 -2.373 -1.903 -1.432 -.962 -.491 -.021 .449 .920
+.........+.........+.........+.........+.........+.........+.........+.........+.........+.........+.
.719 . . .719
.631 . 3 . .631
.543 . . .543
.455 . . .455
.367 . . .367
.280 . . .280
.192 . . .192
.104 . . .104
.016 . . .016
-.072 . . -.072
-.160 . 3 . -.160
-.248 . . -.248
-.336 . 1 . -.336
-.424 . 3 MEAN3 . -.424
-.512 . * 1 . -.512
-.600 . . -.600
-.688 . . -.688
-.776 . . -.776
-.864 . . -.864
-.952 . 3 . -.952
-1.040 . 1 1 . -1.040
-1.128 . 1 . -1.128
-1.216 . 2 1 . -1.216
-1.304 . 2 . -1.304
-1.392 . 1 1 . -1.392
-1.480 . 2 . -1.480
-1.568 . MEAN2 . -1.568
-1.656 . * 3 . -1.656
-1.744 . 2 2 1 1 . -1.744
-1.832 . 2 1 . -1.832
-1.920 . * MEAN1 . -1.920
-2.008 . . -2.008
-2.096 . . -2.096
-2.184 . 2 . -2.184
-2.272 . . -2.272
-2.360 . . -2.360
-2.448 . . -2.448
-2.536 . 1 1 . -2.536
-2.624 . . -2.624
-2.711 . . -2.711
-2.799 . 1 . -2.799
-2.887 . . -2.887
-2.975 . . -2.975
-3.063 . . -3.063
-3.151 . . -3.151
-3.239 . . -3.239
-3.327 . 1 . -3.327
-3.415 . . -3.415
-3.503 . . -3.503
-3.591 . 1 . -3.591
-3.679 . . -3.679
-3.767 . . -3.767
-3.855 . . -3.855
-3.943 . 1 . -3.943
-4.031 . . -4.031
+.........+.........+.........+.........+.........+.........+.........+.........+.........+.........+.
-3.784 -3.314 -2.843 -2.373 -1.903 -1.432 -.962 -.491 -.021 .449 .920
|
Step number 6 Variables entered : 3 7 11 8 4 9
Original group
% correctly classified : 75.86 |
||
|
Step number 7 Variables entered : 3 7 11 8 4 9 2
Original group
% correctly classified : 79.31 |
||
|
Step number 8 Variables entered : 3 7 11 8 4 9 2 6
Original group
% correctly classified : 79.31
Group 1 Allocation Distances to each group
GR01
1 3 3.712 ( 1) 9.107 ( 2) 1.991 ( 3)
3 1 3.473 ( 1) 4.135 ( 2) 5.260 ( 3)
5 3 4.951 ( 1) 7.754 ( 2) 4.380 ( 3)
6 1 1.291 ( 1) 1.934 ( 2) 3.270 ( 3)
7 1 2.714 ( 1) 5.709 ( 2) 3.670 ( 3)
9 1 5.540 ( 1) 12.311 ( 2) 12.788 ( 3)
10 1 11.016 ( 1) 18.649 ( 2) 18.901 ( 3)
11 3 2.795 ( 1) 4.652 ( 2) 2.614 ( 3)
16 3 5.507 ( 1) 8.813 ( 2) 3.904 ( 3)
17 1 8.908 ( 1) 13.420 ( 2) 10.915 ( 3)
18 1 6.774 ( 1) 8.307 ( 2) 14.724 ( 3)
22 1 5.254 ( 1) 7.595 ( 2) 7.854 ( 3)
23 3 16.312 ( 1) 19.895 ( 2) 15.910 ( 3)
24 1 7.261 ( 1) 9.272 ( 2) 7.976 ( 3)
28 1 9.260 ( 1) 13.796 ( 2) 10.494 ( 3)
30 1 12.990 ( 1) 18.726 ( 2) 14.945 ( 3)
35 1 16.070 ( 1) 24.131 ( 2) 20.797 ( 3)
Group 2 Allocation Distances to each group
GR02
2 2 10.159 ( 1) 7.820 ( 2) 9.653 ( 3)
12 2 6.492 ( 1) 2.407 ( 2) 9.006 ( 3)
15 2 23.166 ( 1) 14.579 ( 2) 23.009 ( 3)
21 2 7.518 ( 1) 3.288 ( 2) 7.129 ( 3)
25 2 5.451 ( 1) 3.352 ( 2) 8.440 ( 3)
26 2 7.361 ( 1) 5.925 ( 2) 8.689 ( 3)
29 2 8.778 ( 1) 5.046 ( 2) 11.636 ( 3)
Group 3 Allocation Distances to each group
GR03
4 3 13.573 ( 1) 14.864 ( 2) 10.981 ( 3)
8 3 5.731 ( 1) 5.106 ( 2) 4.758 ( 3)
14 3 9.590 ( 1) 16.108 ( 2) 5.891 ( 3)
19 1 6.470 ( 1) 9.831 ( 2) 6.514 ( 3)
33 3 13.243 ( 1) 17.049 ( 2) 9.708 ( 3)
Sum of eigenvalues = .96221 Discriminant power of first factor : .67570 Discriminant power of second factor : .28651 Discriminant power of third factor : .00000 First eigenvector
Second eigenvector
Values of discriminant factors for all observations and group means Group GR01 Group GR02 Group GR03 Codes used for presentation in the graph The group near is indicated by * The overlapping of 2 observations from different groups by $ One observation of the group GR01 is presented by 1 One observation of the group GR02 is presented by 2 One observation of the group GR03 is presented by 3 |
1 -2.939 -2.516 -2.092 -1.669 -1.245 -.821 -.398 .026 .449 .873 1.296
+.........+.........+.........+.........+.........+.........+.........+.........+.........+.........+.
-.526 . . -.526
-.602 . 3 . -.602
-.678 . 3 . -.678
-.754 . . -.754
-.829 . . -.829
-.905 . . -.905
-.981 . 3 . -.981
-1.057 . . -1.057
-1.133 . * . -1.133
-1.208 . . -1.208
-1.284 . 1 1 . -1.284
-1.360 . . -1.360
-1.436 . . -1.436
-1.512 . 3 . -1.512
-1.587 . . -1.587
-1.663 . 2 2 1 . -1.663
-1.739 . 2 1 1 . -1.739
-1.815 . . -1.815
-1.891 . 3 . -1.891
-1.966 . . -1.966
-2.042 . 1 . -2.042
-2.118 . . -2.118
-2.194 . * 1 . -2.194
-2.270 . 2 1 . -2.270
-2.345 . . -2.345
-2.421 . 1 . -2.421
-2.497 . 1 1 1 . -2.497
-2.573 . * . -2.573
-2.649 . 2 . -2.649
-2.725 . 1 . -2.725
-2.800 . 2 2 . -2.800
-2.876 . . -2.876
-2.952 . . -2.952
-3.028 . . -3.028
-3.104 . . -3.104
-3.179 . . -3.179
-3.255 . . -3.255
-3.331 . . -3.331
-3.407 . . -3.407
-3.483 . 1 . -3.483
-3.558 . . -3.558
-3.634 . . -3.634
-3.710 . . -3.710
-3.786 . . -3.786
-3.862 . . -3.862
-3.937 . . -3.937
-4.013 . . -4.013
-4.089 . . -4.089
-4.165 . . -4.165
-4.241 . . -4.241
-4.316 . 1 . -4.316
-4.392 . . -4.392
-4.468 . . -4.468
-4.544 . 1 1 . -4.544
-4.620 . . -4.620
+.........+.........+.........+.........+.........+.........+.........+.........+.........+.........+.
-2.939 -2.516 -2.092 -1.669 -1.245 -.821 -.398 .026 .449 .873 1.296
0Memory at disposal : 5000
Memory used : 932
|
IDAMS reports analysis specifications |
||
|
|
Table of means and standard deviation of variables separately for each group and for the entire sample. This table gives an idea to which fields are more or less prominent in which group. |
|
|
At this stage, the most important discriminant variable is identified and a linear discriminant function is computed and used for classification of countries according to their distance from the group centroids. The most important discriminant variable is V3 (condensed matter physics) Classification table shows that No. of cases correctly classified = 65.52% Average value of V3 is higher in this group than in any other group. |
||
|
Step 2 At this stage the most important variable after V3 is V7 (Partial Physics) The discriminant function computed with these variables, achieves a classification accuracy is 65.52%. It is interesting to note there is no change in the overall classification accuracy, but there is a change in the classification matrix. One country from group 2 which was misclarified into Group 1 is now misclarified into Group 3 |
||
|
Step 3 At this stage the most important discriminant variable after V2 and V7 is V11 (Acoutics). The discriminant function set up with these three variables achieve a clarification accuracy of 72.41%. |
||
|
Step 4 At this stage, the most important discriminant variable is Mathematical Physics, but the overall clarification accuracy remains uncharged, but there is some adjustment is the classification matrix. |
||
|
At this stage the most important discriminant variable after V3, V7,
V11 and V8 is V4 The discriminant function computed with these variables achieves a classifcation accuracy of 79.35%. Note that: All East-European countries are correctly classified. |
||
|
Allocation Table and distances from the centroids of each group Allocation Table and Distances from the centroid of each group This table provides information as to how different countries are situated from the centroids of different groups. The countries are allocated to the nearest group. The following countries of Group-1 which are allocated to Group 1 are still quite away from their centroid. Idcode Distance from the centroid of Grouping 35 15.999 These countries seen to be outliers. 5 countries of this region are misclassified into Group 3 (Asia). Their idcodes are printed. Group 2. Group 3 |
||
|
Discriminant Function Analysis The first discriminant function is linear combination of the variables that best discriminant between the groups. The second discriminant function is orthgond to the first and is the next best combination of the variable. |