|
Research Question |
: |
Explore the pattern of relationships between the time involvement of academic scientists and a set of contextual categorical variables: Rank, institutional settings, scientific field, external funding of research. |
|
Methodology |
: |
Classification and regression trees: SEARCH module: MEANS Analysis |
|
Dataset |
: |
ANJU.DAT |
$RUN SEARCH $FILES PRINT = CART.LST DICTIN = ANJU.DIC DATAIN = ANJU.DAT $SETUP INCLUDE V9=1-3 AND V10=1,2 AND V12=1,2 AND V15=1,2 SEARCHING OF STRUCTURE BADDATA=MD1 - ANALYSIS=MEAN - DEPVAR=V2 - MINCASE=10 - PRINT=(DICT, TRACW,TREE,FINAL) VARS=(V9) TYPE=M VARS=(V10,V12,V14,V15) TYPE=F ------------------------------------
Note : Search module does not recognize missing data or invalid codes for the predictor variables. Hence, the filter Include has been used. All options set at default values
|
After filtering 1021 cases read from the input data file
|
||
|
|
Split 1 candidate groups Group N Sum(WT) Mean Y Var Y Variation
1 1005 .10050E+04 .41851E+02 .32525E+03 .32655E+06
|
|
|
Split 2 candidate groups Group N Sum(WT) Mean Y Var Y Variation
2 330 .33000E+03 .33533E+02 .24002E+03 .78966E+05
3 675 .67500E+03 .45917E+02 .31691E+03 .21360E+06
Attempt to split group 3 Var= 213595.36
|
||
|
Split 3 candidate groups Group N Sum(WT) Mean Y Var Y Variation
2 330 .33000E+03 .33533E+02 .24002E+03 .78966E+05
4 297 .29700E+03 .40717E+02 .26587E+03 .78698E+05
5 378 .37800E+03 .50003E+02 .31978E+03 .12056E+06
|
||
|
Split 4 candidate groups Group N Sum(WT) Mean Y Var Y Variation
2 330 .33000E+03 .33533E+02 .24002E+03 .78966E+05
4 297 .29700E+03 .40717E+02 .26587E+03 .78698E+05
6 218 .21800E+03 .47601E+02 .28966E+03 .62856E+05
7 160 .16000E+03 .53275E+02 .34421E+03 .54730E+05
|
||
|
Split 5 candidate groups Group N Sum(WT) Mean Y Var Y Variation
4 297 .29700E+03 .40717E+02 .26587E+03 .78698E+05
6 218 .21800E+03 .47601E+02 .28966E+03 .62856E+05
7 160 .16000E+03 .53275E+02 .34421E+03 .54730E+05
8 53 .53000E+02 .19094E+02 .98087E+02 .51005E+04
9 277 .27700E+03 .36296E+02 .21993E+03 .60702E+05
|
||
|
Split 6 candidate groups Group N Sum(WT) Mean Y Var Y Variation
6 218 .21800E+03 .47601E+02 .28966E+03 .62856E+05
7 160 .16000E+03 .53275E+02 .34421E+03 .54730E+05
8 53 .53000E+02 .19094E+02 .98087E+02 .51005E+04
9 277 .27700E+03 .36296E+02 .21993E+03 .60702E+05
10 99 .99000E+02 .33879E+02 .19568E+03 .19177E+05
11 198 .19800E+03 .44136E+02 .26689E+03 .52577E+05
Split 6 candidate groups Group N Sum(WT) Mean Y Var Y Variation
7 160 .16000E+03 .53275E+02 .34421E+03 .54730E+05
8 53 .53000E+02 .19094E+02 .98087E+02 .51005E+04
9 277 .27700E+03 .36296E+02 .21993E+03 .60702E+05
10 99 .99000E+02 .33879E+02 .19568E+03 .19177E+05
11 198 .19800E+03 .44136E+02 .26689E+03 .52577E+05
Split 6 candidate groups Group N Sum(WT) Mean Y Var Y Variation
7 160 .16000E+03 .53275E+02 .34421E+03 .54730E+05
8 53 .53000E+02 .19094E+02 .98087E+02 .51005E+04
10 99 .99000E+02 .33879E+02 .19568E+03 .19177E+05
11 198 .19800E+03 .44136E+02 .26689E+03 .52577E+05
|
||
|
Split 7 candidate groups Group N Sum(WT) Mean Y Var Y Variation
8 53 .53000E+02 .19094E+02 .98087E+02 .51005E+04
10 99 .99000E+02 .33879E+02 .19568E+03 .19177E+05
11 198 .19800E+03 .44136E+02 .26689E+03 .52577E+05
12 67 .67000E+02 .48373E+02 .39621E+03 .26150E+05
13 93 .93000E+02 .56806E+02 .28055E+03 .25811E+05
1 |
||
|
Split 8 candidate groups Group N Sum(WT) Mean Y Var Y Variation
8 53 .53000E+02 .19094E+02 .98087E+02 .51005E+04
10 99 .99000E+02 .33879E+02 .19568E+03 .19177E+05
12 67 .67000E+02 .48373E+02 .39621E+03 .26150E+05
13 93 .93000E+02 .56806E+02 .28055E+03 .25811E+05
14 95 .95000E+02 .39537E+02 .19251E+03 .18096E+05
15 103 .10300E+03 .48379E+02 .30018E+03 .30618E+05
Split 8 candidate groups Group N Sum(WT) Mean Y Var Y Variation
8 53 .53000E+02 .19094E+02 .98087E+02 .51005E+04
10 99 .99000E+02 .33879E+02 .19568E+03 .19177E+05
12 67 .67000E+02 .48373E+02 .39621E+03 .26150E+05
13 93 .93000E+02 .56806E+02 .28055E+03 .25811E+05
14 95 .95000E+02 .39537E+02 .19251E+03 .18096E+05
Split 8 candidate groups Group N Sum(WT) Mean Y Var Y Variation
8 53 .53000E+02 .19094E+02 .98087E+02 .51005E+04
10 99 .99000E+02 .33879E+02 .19568E+03 .19177E+05
13 93 .93000E+02 .56806E+02 .28055E+03 .25811E+05
14 95 .95000E+02 .39537E+02 .19251E+03 .18096E+05
1 Split 8 candidate groups Group N Sum(WT) Mean Y Var Y Variation
8 53 .53000E+02 .19094E+02 .98087E+02 .51005E+04
10 99 .99000E+02 .33879E+02 .19568E+03 .19177E+05
14 95 .95000E+02 .39537E+02 .19251E+03 .18096E+05
1 Split 8 candidate groups Group N Sum(WT) Mean Y Var Y Variation
8 53 .53000E+02 .19094E+02 .98087E+02 .51005E+04
14 95 .95000E+02 .39537E+02 .19251E+03 .18096E+05
1 Split 8 candidate groups Group N Sum(WT) Mean Y Var Y Variation
8 53 .53000E+02 .19094E+02 .98087E+02 .51005E+04
|
||
|
The partitioning ends with 8 final groups The variation explained is 23.9% One-way analysis of final groups Source Variation DF
Explained .78042510E+05 7
Error .24850910E+06 997
Total .32655160E+06 1004
|
||
|
Split summary table Group 1 1005 cases Group 3 675 cases Group 5 378 cases Group 2 330 cases Group 4 297 cases Group 7 160 cases Group 11 198 cases |
||
|
Final group summary table Group 6 218 cases Group 8 53 cases Group 9 277 cases Group 10 99 cases Group 12 67 cases Group 13 93 cases Group 14 95 cases Group 15 103 cases |
|
IDAMS reports analysis specifications: |
||
|
|
Group 1 Sum (wt) = # of cases Attempt to split Group
1 Predictor code Cut-point Variance explained V9(1,2,3) 1 0.25600360E+05 V10(1,2) 1 0.71446650E+04 V12(1,2) 1 0.24188630E+05 V14(3,2,4,1) 2 0.33990140E+05 V15(1,2) No eligible split Best split for Group 1 is on predictor V14 since it has the maximum value of sum of squares Group-1 is split on variable V14 into two groups:
|
|
|
Split 2 candidate groups Group N Sum(WT) Mean Y Var Y Variation
2 330 .33000E+03 .33533E+02 .24002E+03 .78966E+05
3 675 .67500E+03 .45917E+02 .31691E+03 .21360E+06
Group 3 will be split first, because it has greater variance. Best predictor and its cut-point for splitting Group 3 is selected in the same manner as in Step 1. Predictor V12 (1,2) at cut-point = 1 is found to be the best predictor, since it explains greater variance compared to the best splits of other predictors. Group 3 is split into: |
||
|
Split 3 candidate groups Group N Sum(WT) Mean Y Var Y Variation
2 330 .33000E+03 .33533E+02 .24002E+03 .78966E+05
4 297 .29700E+03 .40717E+02 .26587E+03 .78698E+05
5 378 .37800E+03 .50003E+02 .31978E+03 .12056E+06
All the predictor are again evaluated for their best splits and Predictor V9 (1,2,3) is selected for splitting Group 4 at cut off : 1, 2. Note that predictor V9 is monotonic. |
||
|
Split 4 candidate groups Group-2, Group-5, Group-6 and Group-7 Group-2 will be split first since it explains the maximum variation (78966.13) Again all candidate predictors are evaluated for splitting Group-2 PredictorV14(3,2) is chosen for splitting Group-2 Group-2 is split into |
||
|
Split 5 candidate groups Best predictor and cut-point value is selected in the same manner as
in the earlier steps. Group-5 is split into groups: |
||
|
Split 6 candidate groups Name of the predictor variables meats the eligibility criterion for splitting Group-6. Group-9 is now chosen for split, but none of the predictor (60701.727)
Group-10 is now considered for further split, but none of the predictors variables could split this group. The remaining raps (viz. Group-8, Group-14) were considered for further split, but none of the predictor variables could split any of these groups. Group-7 is chosen for split. This group is split on predictor V14(4,1)
into two groups: |
||
|
Split 7 candidate groups Group-11 is now split into two groups on Predictor V14(4,1): |
||
| Split 8 candidate groups At this stage, there are 6 candidate groups for split: Group-8, Group-10, Group-12, Group-13, Group-14, Group-15 Of these, Group-15 has the maximum value of sum of squares, hence considered for further split: None of the predictor variables is found to be eligible for further split Group-12 is now considered for split, since it has the maximum variables meet the eligibility criterion for splitting this group. Group-10 is now considered for further split, but none of the predictor variables could split this group. The remaining raps (viz. Group-8, Group-14) were considered for further split, but none of the predictor variables could split any of these groups. The partition ends with 8 final groups (Variance explained = 23.9%) Sum of Square DF (No. of final groups -1) Explained 78042.51 7 Error 248509.91 997 Total 326551.60 1004 |
||
|
The foregoing results of partitioning the data set are summarized in the summary table |
_______________
N=Number of cases | Group 1 |
Y=Dep. var. mean, v262:teaching | N=1005 |
C=Predictor codes |Y= 41.85075|
| Split 1 |
| on V14 |
|_____________|
|
_________________________________sv:inst type__________________________________
1
C=3,2 C=4,1
| |
_______|_______ _______|_______
| Group 2 | | Group 3 |
| N=330 | | N=675 |
|Y= 33.53333| |Y= 45.91704|
| Split 4 | | Split 2 |
| on V14 | | on V12 |
|_____________| |_____________|
| |
_________________sv:inst type__________________ ________________v335:ext funds_________________
C=3 C=2 C=1 C=2
| | | |
_______|_______ _______|_______ _______|_______ _______|_______
| Group 8 | | Group 9 | | Group 4 | | Group 5 |
| N=53 | | N=277 | | N=378 | | N=297 |
|Y= 19.09434| |Y= 36.29603| |Y= 50.00265| |Y= 40.71717|
| Final | | Final | | Split 3 | | Split 5 |
| | | | | on V9 | | on V9 |
|_____________| |_____________| |_____________| |_____________|
| |
___________v204:rank___________ ___________v204:rank___________
C=1,2 C=3 C=1 C=2,3
| | | |
_______|_______ _______|_______ _______|_______ _______|_______
| Group 6 | | Group 7 | | Group 10 | | Group 11 |
| N=218 | | N=160 | | N=99 | | N=198 |
|Y= 47.60092| |Y= 53.27500| |Y= 33.87879| |Y= 44.13636|
| Final | | Split 6 | | Final | | Split 7 |
| | | on V14 | | | | on V14 |
|_____________| |_____________| |_____________| |_____________|
| |
| |
1
|
_______|_______
N=Number of cases | Group 7 |
Y=Dep. var. mean, v262:teaching | N=160 |
C=Predictor codes |Y= 53.27500|
| Split 6 |
| on V14 |
|_____________|
|
_________________________________sv:inst type__________________________________
C=4 C=1
| |
_______|_______ _______|_______
| Group 12 | | Group 13 |
| N=67 | | N=93 |
|Y= 48.37313| |Y= 56.80645|
| Final | | Final |
| | | |
|_____________| |_____________|
1
|
_______|_______
N=Number of cases | Group 11 |
Y=Dep. var. mean, v262:teaching | N=198 |
C=Predictor codes |Y= 44.13636|
| Split 7 |
| on V14 |
|_____________|
|
_________________________________sv:inst type__________________________________
C=4 C=1
| |
_______|_______ _______|_______
| Group 14 | | Group 15 |
| N=95 | | N=103 |
|Y= 39.53684| |Y= 48.37864|
| Final | | Final |
| | | |
|_____________| |_____________|
N=Number of cases | Group 1 |
Y=Dep. var. mean, v262:teaching | N=1005 |
C=Predictor codes |Y= 41.85075|
| Split 1 |
| on V14 |
|_____________|
|
_________________________________sv:inst type__________________________________
1
C=3,2 C=4,1
| |
_______|_______ _______|_______
| Group 2 | | Group 3 |
| N=330 | | N=675 |
|Y= 33.53333| |Y= 45.91704|
| Split 4 | | Split 2 |
| on V14 | | on V12 |
|_____________| |_____________|
| |
_________________sv:inst type__________________ ________________v335:ext funds_________________
C=3 C=2 C=1 C=2
| | | |
_______|_______ _______|_______ _______|_______ _______|_______
| Group 8 | | Group 9 | | Group 4 | | Group 5 |
| N=53 | | N=277 | | N=378 | | N=297 |
|Y= 19.09434| |Y= 36.29603| |Y= 50.00265| |Y= 40.71717|
| Final | | Final | | Split 3 | | Split 5 |
| | | | | on V9 | | on V9 |
|_____________| |_____________| |_____________| |_____________|
| |
___________v204:rank___________ ___________v204:rank___________
C=1,2 C=3 C=1 C=2,3
| | | |
_______|_______ _______|_______ _______|_______ _______|_______
| Group 6 | | Group 7 | | Group 10 | | Group 11 |
| N=218 | | N=160 | | N=99 | | N=198 |
|Y= 47.60092| |Y= 53.27500| |Y= 33.87879| |Y= 44.13636|
| Final | | Split 6 | | Final | | Split 7 |
| | | on V14 | | | | on V14 |
|_____________| |_____________| |_____________| |_____________|
| |
| |
1
|
_______|_______
N=Number of cases | Group 7 |
Y=Dep. var. mean, v262:teaching | N=160 |
C=Predictor codes |Y= 53.27500|
| Split 6 |
| on V14 |
|_____________|
|
_________________________________sv:inst type__________________________________
C=4 C=1
| |
_______|_______ _______|_______
| Group 12 | | Group 13 |
| N=67 | | N=93 |
|Y= 48.37313| |Y= 56.80645|
| Final | | Final |
| | | |
|_____________| |_____________|
1
|
_______|_______
N=Number of cases | Group 11 |
Y=Dep. var. mean, v262:teaching | N=198 |
C=Predictor codes |Y= 44.13636|
| Split 7 |
| on V14 |
|_____________|
|
_________________________________sv:inst type__________________________________
C=4 C=1
| |
_______|_______ _______|_______
| Group 14 | | Group 15 |
| N=95 | | N=103 |
|Y= 39.53684| |Y= 48.37864|
| Final | | Final |
| | | |
|_____________| |_____________|
***** Normal termination of SEARCH
***** No more RUN statements in SETUP; step terminated
|
Research Question |
: |
How does the pattern of relationship between the time spent by on academic scientist in contextual fac6tors?. |
|
Methodology |
: |
Classification and regression trees: SEARCH module Regression analysis |
|
Dataset |
: |
ANJU.DAT |
$RUN SEARCH $FILES PRINT = SEARCH2.LST DICTIN = ANJU.DIC DATAIN = ANJU.DAT $SETUP INCLUDE V9=1-3 AND V15=1,2 SEARCHING OF STRUCTURE: REGRESSIN ANALYSIS BADDATA=MD1 - ANALYSIS=REGRESSION - DEPVAR=V4 - COVARIATE= V13 - MINCASE=10 - IDVAR=V1 - PRINT=(TRACE, TABLE, TREE) VARS= (V14, V15) TYPE=F VARS= V9 TYPE=M ------------------------------------
Note : Search module does not recognize missing data or invalid codes for the predictor variables. Hence, the filter Include has been used. All options set at default values
|
Dependent variable: V4 |
||
|
|
Split 1 candidate groups Group N Sum(WT) Mean Y,Z Var Y,Z Slope Variation
1 1016 .10160E+04 .15990E+02 .14602E+03 .30705E+01 .11106E+06
.20010E+01 .38828E+01
|
|
|
Split 2 candidate groups Group N Sum(WT) Mean Y,Z Var Y,Z Slope Variation
2 57 .57000E+02 .31877E+02 .31175E+03 .21646E+01 .16436E+05
.34561E+01 .38954E+01
3 959 .95900E+03 .15046E+02 .12058E+03 .28464E+01 .86385E+05
.19145E+01 .37526E+01
|
||
|
Split 3 candidate groups Group N Sum(WT) Mean Y,Z Var Y,Z Slope Variation
2 57 .57000E+02 .31877E+02 .31175E+03 .21646E+01 .16436E+05
.34561E+01 .38954E+01
4 317 .31700E+03 .18356E+02 .10762E+03 .17423E+01 .30268E+05
.26751E+01 .38973E+01
5 642 .64200E+03 .13411E+02 .11906E+03 .32997E+01 .53565E+05
.15389E+01 .32598E+01
|
||
|
Split 4 candidate groups Group N Sum(WT) Mean Y,Z Var Y,Z Slope Variation
2 57 .57000E+02 .31877E+02 .31175E+03 .21646E+01 .16436E+05
.34561E+01 .38954E+01
4 317 .31700E+03 .18356E+02 .10762E+03 .17423E+01 .30268E+05
.26751E+01 .38973E+01
6 374 .37400E+03 .14513E+02 .12511E+03 .35936E+01 .34194E+05
.14305E+01 .25890E+01
7 268 .26800E+03 .11873E+02 .10697E+03 .31721E+01 .17360E+05
.16903E+01 .41696E+01
1 Split 4 candidate groups Group N Sum(WT) Mean Y,Z Var Y,Z Slope Variation
2 57 .57000E+02 .31877E+02 .31175E+03 .21646E+01 .16436E+05
.34561E+01 .38954E+01
4 317 .31700E+03 .18356E+02 .10762E+03 .17423E+01 .30268E+05
.26751E+01 .38973E+01
7 268 .26800E+03 .11873E+02 .10697E+03 .31721E+01 .17360E+05
.16903E+01 .41696E+01
|
||
|
Split 5 candidate groups Group N Sum(WT) Mean Y,Z Var Y,Z Slope Variation
2 57 .57000E+02 .31877E+02 .31175E+03 .21646E+01 .16436E+05
.34561E+01 .38954E+01
7 268 .26800E+03 .11873E+02 .10697E+03 .31721E+01 .17360E+05
.16903E+01 .41696E+01
8 139 .13900E+03 .15302E+02 .86966E+02 .19206E+01 .10174E+05
.23022E+01 .35892E+01
9 178 .17800E+03 .20742E+02 .11128E+03 .13002E+01 .18510E+05
.29663E+01 .39650E+01
1 Split 5 candidate groups Group N Sum(WT) Mean Y,Z Var Y,Z Slope Variation
2 57 .57000E+02 .31877E+02 .31175E+03 .21646E+01 .16436E+05
.34561E+01 .38954E+01
7 268 .26800E+03 .11873E+02 .10697E+03 .31721E+01 .17360E+05
.16903E+01 .41696E+01
8 139 .13900E+03 .15302E+02 .86966E+02 .19206E+01 .10174E+05
.23022E+01 .35892E+01
1 Split 5 candidate groups Group N Sum(WT) Mean Y,Z Var Y,Z Slope Variation
2 57 .57000E+02 .31877E+02 .31175E+03 .21646E+01 .16436E+05
.34561E+01 .38954E+01
8 139 .13900E+03 .15302E+02 .86966E+02 .19206E+01 .10174E+05
.23022E+01 .35892E+01
|
||
|
Split 6 candidate groups Group N Sum(WT) Mean Y,Z Var Y,Z Slope Variation
8 139 .13900E+03 .15302E+02 .86966E+02 .19206E+01 .10174E+05
.23022E+01 .35892E+01
10 25 .25000E+02 .27880E+02 .16594E+03 .40679E+00 .39657E+04
.32400E+01 .42733E+01
11 32 .32000E+02 .35000E+02 .41174E+03 .34537E+01 .11410E+05
.36250E+01 .36613E+01
1 Split 6 candidate groups Group N Sum(WT) Mean Y,Z Var Y,Z Slope Variation
8 139 .13900E+03 .15302E+02 .86966E+02 .19206E+01 .10174E+05
.23022E+01 .35892E+01
10 25 .25000E+02 .27880E+02 .16594E+03 .40679E+00 .39657E+04
.32400E+01 .42733E+01
1 Split 6 candidate groups Group N Sum(WT) Mean Y,Z Var Y,Z Slope Variation
10 25 .25000E+02 .27880E+02 .16594E+03 .40679E+00 .39657E+04
.32400E+01 .42733E+01
|
||
|
The partitioning ends with 6 final groups The variation explained is 13.9% One-way analysis of final groups Source Variation DF
Explained .15441640E+05 5
Error .95613590E+05 1010
Total .11105520E+06 1015
|
||
|
Split summary table Group 1 1016 cases Group 3 959 cases Group 5 642 cases Group 4 317 cases Group 2 57 cases Group 6 374 cases Group 7 268 cases Group 8 139 cases Group 9 178 cases Group 10 25 cases Group 11 32 cases |
|
IDAM reports analysis specifications |
||
|
|
Group 1 (the entire sample)
Variance of
|
|
|
Attempt to split Group 1 The algorithm attempts to make binary splits at different cut-off points of all variables and selects the best cut-off, which results in maximum difference in the slope between the parent group and the descendent group. Predictor Cut-off Variance explained
(After code)
V14 (3,2,4,1) 3 8233.725
V15 (2,1) 2 901.284
V9 (1,2,3) 2 3049.774
Best split for Group 1 on predictor V14 2 57 .57000E+02 .31877E+02 .31175E+03 .21646E+01 .16436E+05
.34561E+01 .38954E+01
3 959 .95900E+03 .15046E+02 .12058E+03 .28464E+01 .86385E+05
.19145E+01 .37526E+01
Group-3 is selected for further split since its slope is greater than that of Group-2 All the predictors are examined for their best splits. Predictor Cut-off Variance explained
(After code)
V14 (2,4,1) 4 1452.317
V15 (2,1) No eligible split
V9 (1,2,3) 1 2552.041
Group-3 is now spilt on V9 into two groups:
|
||
|
Split 3 Group N Sum(WT) Mean Y,Z Var Y,Z Slope Variation
2 57 .57000E+02 .31877E+02 .31175E+03 .21646E+01 .16436E+05
34561E+01 .38954E+01
4 317 .31700E+03 .18356E+02 .10762E+03 .17423E+01 .30268E+05
.26751E+01 .38973E+01
5 642 .64200E+03 .13411E+02 .11906E+03 .32997E+01 .53565E+05
.15389E+01 .32598E+01
Group 5 is selected for forth split, since it accounts for the largest value of the slope and variance. All predictors are evaluated for splitting Group-5. Predictor Cut-off Variance explained
(After code)
V14 (2,4,1) 4 2011.419
V15 (2,1) No eligible split
V9 (2,3) 1 No eligible split
Group-5 is now split on V14 into two groups: |
||
|
Split 4 At this stage, there are 4 candidates groups for further splits. Group N Sum(WT) Mean Y,Z Var Y,Z Slope Variation
2 57 .57000E+02 .31877E+02 .31175E+03 .21646E+01 .16436E+05
.34561E+01 .38954E+01
4 317 .31700E+03 .18356E+02 .10762E+03 .17423E+01 .30268E+05
.26751E+01 .38973E+01
6 374 .37400E+03 .14513E+02 .12511E+03 .35936E+01 .34194E+05
.14305E+01 .25890E+01
7 268 .26800E+03 .11873E+02 .10697E+03 .31721E+01 .17360E+05
.16903E+01 .41696E+01
Group-6 is selected for further split since it has the largest value of the slope (variance explained). No eligible split for Group-6 is found. Group-4 is then considered for possible split. All the predictor variables were evaluated for splitting this group. Predictors V9 and V14 did not meet the eligibility criterion for split. Hence, this group was split on variable V15 (1, 2) into two groups:
|
||
|
At this stage there are 4 candidate groups for further split Group N Sum(WT) Mean Y,Z Var Y,Z Slope Variation
2 57 .57000E+02 .31877E+02 .31175E+03 .21646E+01 .16436E+05
.34561E+01 .38954E+01
7 268 .26800E+03 .11873E+02 .10697E+03 .31721E+01 .17360E+05
.16903E+01 .41696E+01
8 139 .13900E+03 .15302E+02 .86966E+02 .19206E+01 .10174E+05
.23022E+01 .35892E+01
9 178 .17800E+03 .20742E+02 .11128E+03 .13002E+01 .18510E+05
.29663E+01 .39650E+01
Group-9, Group-7 were sequentially evaluated for further split, but none of the predictor met the eligibility criterion to split any of these groups. Group 2 was considered for further split. Only one variable i.e.V15 met the eligibility criterion, and hence this group was split into two groups:
|
||
|
At this stage there are three candidate groups for further split: Group-8, Group-10 and Group-11 Group N Sum(WT) Mean Y,Z Var Y,Z Slope Variation
8 139 .13900E+03 .15302E+02 .86966E+02 .19206E+01 .10174E+05
.23022E+01 .35892E+01
10 25 .25000E+02 .27880E+02 .16594E+03 .40679E+00 .39657E+04
.32400E+01 .42733E+01
11 32 .32000E+02 .35000E+02 .41174E+03 .34537E+01 .11410E+05
.36250E+01 .36613E+01
These groups were considered sequentially for further split, but none of the predictor variables met the eligibility criterion for splitting any of these groups. |
||
|
The partition ends with 6 terminal groups Group-10, Group-11, Group-8, Group-9, Group-6 and Group-7. This partition explains 13.9% of the total variance Analysis of variance Table Source Variation DF Explained .15441640E+05 5 Error .95613590E+05 1010 Total .11105520E+06 1015 |
||
|
History of the splitting process. |
||
|
Graphical representation of the splitting process. |
_______________
N=Number of cases | Group 1 |
Y=Dep. var. mean, v264:supervsn | N=1016 |
Z=Covariate mean |Y= 15.99016|
S=Slope |Z= 2.00098|
C=Predictor codes |S= 3.07054|
| Split 1 |
| on V14 |
|_____________|
|
_________________________________sv:inst type__________________________________
C=3 C=2,4,1
| |
_______|_______ _______|_______
| Group 2 | | Group 3 |
| N=57 | | N=959 |
|Y= 31.87719| |Y= 15.04588|
|Z= 3.45614| |Z= 1.91449|
|S= 2.16463| |S= 2.84640|
| Split 5 | | Split 2 |
| on V15 | | on V9 |
|_____________| |_____________|
| |
____________________:field_____________________ ___________________v204:rank___________________
C=2 C=1 C=1 C=2,3
| | | |
_______|_______ _______|_______ _______|_______ _______|_______
| Group 10 | | Group 11 | | Group 4 | | Group 5 |
| N=25 | | N=32 | | N=317 | | N=642 |
|Y= 27.88000| |Y= 35.00000| |Y= 18.35647| |Y= 13.41121|
|Z= 3.24000| |Z= 3.62500| |Z= 2.67508| |Z= 1.53894|
|S= .40679| |S= 3.45374| |S= 1.74231| |S= 3.29966|
| Final | | Final | | Split 4 | | Split 3 |
| | | | | on V15 | | on V14 |
|_____________| |_____________| |_____________| |_____________|
| |
____________:field_____________ _________sv:inst type__________
C=2 C=1 C=2,4 C=1
| | | |
_______|_______ _______|_______ _______|_______ _______|_______
| Group 8 | | Group 9 | | Group 6 | | Group 7 |
| N=139 | | N=178 | | N=374 | | N=268 |
|Y= 15.30216| |Y= 20.74157| |Y= 14.51337| |Y= 11.87313|
|Z= 2.30216| |Z= 2.96629| |Z= 1.43048| |Z= 1.69030|
|S= 1.92064| |S= 1.30016| |S= 3.59364| |S= 3.17209|
| Final | | Final | | Final | | Final |
| | | | | | | |
|_____________| |_____________| |_____________| |_____________|
|
Research Question |
: |
Explore the pattern of relationships between the time involvement of academic scientists in teaching ( coded as; 1=Low, 2= Averagr, 3= Above averagre, 4= High) and a set of contextual variables: Rank, institutional settings, scientific field, external funding of research. |
|
Methodology |
: |
Classification and regression trees: SEARCH module CHI analysis |
|
Dataset |
: |
PSN.DAT |
$RUN SEARCH $FILES PRINT = PSN.LST DICTIN = PSN.DIC DATAIN = PSN.DAT $SETUP INCLUDE V2=1-3 AND V3=1-2 AND V5 =1,2 AND V6=1-4 SEARCHING OF STRUCTURE BADDATA=MD1 - ANALYSIS=CHI - DEPVAR=V6 CODE= (1-4) - MINCASE=10 - PRINT=(TREE,DICT,TRACE,FINAL) VARS=V2 TYPE=M VARS=(V3,V4, V5) TYPE=F ------------------------------------
Note : Search module does not recognize missing data or invalid codes for the predictor variables. Hence, the filter Include has been used. All options set at default values
|
After filtering 1011 cases read from the input data file
|
||
|
|
Split 1 candidate groups Group N Sum(WT) Variation
1 1011 .10110E+04 .27381E+04
Attempt to split group 1 Var= 2738.0928 Predictor V2 RANK Rank 1 Type M Best split after code 1 Var expl= .87576360E+02 Predictor V3 FUNDING Rank 2 Type F Best split after code 1 Var expl= .84713570E+02 Predictor V4 INSTYP Rank 2 Type F Best split after code 3 Var expl= .86131580E+02 Predictor V5 FIELD Rank 2 Type F No eligible split Best split for group 1 on predictor V2 RANK Rank 1 Split group 1 on V2 RANK Var expl= .87576360E+02 |
|
|
Split 2 candidate groups Group N Sum(WT) Variation 2 356 .35600E+03 .85404E+03 3 655 .65500E+03 .17965E+04 Attempt to split group 3 Var= 1796.4799 Predictor V2 RANK Rank 1 Type M Best split after code 2 Var expl= .23601020E+02 Predictor V3 FUNDING Rank 2 Type F Best split after code 2 Var expl= .31537450E+02 Predictor V4 INSTYP Rank 2 Type F Best split after code 3 Var expl= .54556970E+02 Predictor V5 FIELD Rank 2 Type F No eligible split Best split for group 3 on predictor V4 INSTYP Rank 1 Split group 3 on V4 INSTYP Var expl= .54556970E+02 |
||
|
Split 3 candidate groups Group N Sum(WT) Variation 2 356 .35600E+03 .85404E+03 4 152 .15200E+03 .38393E+03 5 503 .50300E+03 .13580E+04 Attempt to split group 5 Var= 1357.9944 Predictor V2 RANK Rank 1 Type M No eligible split Predictor V3 FUNDING Rank 2 Type F No eligible split Predictor V4 INSTYP Rank 2 Type F No eligible split Predictor V5 FIELD Rank 2 Type F No eligible split No eligible split for group 5 Split 3 candidate groups Group N Sum(WT) Variation
2 356 .35600E+03 .85404E+03
4 152 .15200E+03 .38393E+03
Attempt to split group 2 Var= 854.03656 Predictor V2 RANK Rank 1 Type M No eligible split Predictor V3 FUNDING Rank 2 Type F Best split after code 1 Var expl= .31131080E+02 Predictor V4 INSTYP Rank 2 Type F Best split after code 3 Var expl= .42759830E+02 Predictor V5 FIELD Rank 2 Type F No eligible split Best split for group 2 on predictor V4 INSTYP Rank 1 Split group 2 on V4 INSTYP Var expl= .42759830E+02 |
||
|
Split 4 candidate groups Group N Sum(WT) Variation 4 152 .15200E+03 .38393E+03 6 36 .36000E+02 .91390E+01 7 320 .32000E+03 .80214E+03 Attempt to split group 7 Var= 802.13776 Predictor V2 RANK Rank 1 Type M No eligible split Predictor V3 FUNDING Rank 2 Type F Best split after code 1 Var expl= .24340890E+02 Predictor V4 INSTYP Rank 2 Type F No eligible split Predictor V5 FIELD Rank 2 Type F No eligible split Best split for group 7 on predictor V3 FUNDING Rank 1 Split group 7 on V3 FUNDING Var expl= .24340890E+02 |
||
|
Split 5 candidate groups Group N Sum(WT) Variation 4 152 .15200E+03 .38393E+03 6 36 .36000E+02 .91390E+01 8 120 .12000E+03 .32932E+03 9 200 .20000E+03 .44848E+03 Attempt to split group 9 Var= 448.47726 Predictor V2 RANK Rank 1 Type M No eligible split Predictor V3 FUNDING Rank 2 Type F No eligible split Predictor V4 INSTYP Rank 2 Type F No eligible split Predictor V5 FIELD Rank 2 Type F No eligible split No eligible split for group 9 Split 5 candidate groups Group N Sum(WT) Variation
4 152 .15200E+03 .38393E+03
6 36 .36000E+02 .91390E+01
8 120 .12000E+03 .32932E+03
Attempt to split group 4 Var= 383.92853 Predictor V2 RANK Rank 1 Type M No eligible split Predictor V3 FUNDING Rank 2 Type F No eligible split Predictor V4 INSTYP Rank 2 Type F No eligible split Predictor V5 FIELD Rank 2 Type F No eligible split No eligible split for group 4 1 Split 5 candidate groups Group N Sum(WT) Variation
6 36 .36000E+02 .91390E+01
8 120 .12000E+03 .32932E+03
Attempt to split group 8 Var= 329.31958 Predictor V2 RANK Rank 1 Type M No eligible split Predictor V3 FUNDING Rank 2 Type F No eligible split Predictor V4 INSTYP Rank 2 Type F No eligible split Predictor V5 FIELD Rank 2 Type F No eligible split No eligible split for group 8 1 Split 5 candidate groups Group N Sum(WT) Variation
6 36 .36000E+02 .91390E+01
Attempt to split group 6 Var= 9.1389990 Predictor V2 RANK Rank 1 Type M No eligible split Predictor V3 FUNDING Rank 2 Type F No eligible split Predictor V4 INSTYP Rank 2 Type F No eligible split Predictor V5 FIELD Rank 2 Type F No eligible split No eligible split for group 6 No splits possible |
||
|
The partitioning ends with 5 final groups The variation explained is 7.6% One-way analysis of final groups Source Variation DF
Explained .20923410E+03 4
Error .25288590E+04 1006
Total .27380930E+04 1010
|
||
|
Split summary table Group 1 1011 cases Variation= .27380930E+04 Group 3 655 cases Variation= .17964800E+04 Group 2 356 cases Variation= .85403660E+03 Group 7 320 cases Variation= .80213780E+03 |
||
|
Final group summary table Group 4 152 cases
Variation= .38392850E+03 Dependent variable percent distribution for each group (*=Final groups) 1 2 3 4* 5* 6* 7 8* 9* Code= 1 36.10 52.25 27.33 42.11 22.86 97.22 47.19 31.67 56.50 Code= 2 19.09 20.51 18.32 22.37 17.10 2.78 22.50 25.00 21.00 Code= 3 21.96 16.85 24.73 26.97 24.06 .00 18.75 23.33 16.00 Code= 4 22.85 10.39 29.62 8.55 35.98 .00 11.56 20.00 6.50 |
|
IDAMS reports that 1011 cases were read from the input data file. |
||
|
|
Split: 1 N = Number of cases Attempt to split Group-1 All the predictors are evaluated, one by one, for their best splits. Predictor Cut-off Entropy After code V2 (1, 2, 3) 1 87.57636 V3 (1, 2) 1 84.71357 V4 (3,4,2,1) 3 86.13158 V5 (2, 1) No eligible split Best split for Group-1 on predictor V2 since it accounts for the maximum entropy. Group-1 is split on V2 into the following groups:
|
|
|
Split 2 At this stage there are two candidate groups. Group N Sum(WT) Variation 2 356 .35600E+03 .85404E+03 3 655 .65500E+703 .1965E+04 Group 3 is considered first for split, since it accounts for the maximum entropy. Best predictors and their cut-off values are as follows: Predictor Cut-off Entropy After code V2 ( 2, 3) 2 23.60102 V3 (2, 1) No eligible split V4 (2, 3,4,1) 3 54.556970 V5 (2, 1) No eligible split Best split for Group-3 is on predictor V4 since it has the maximum value of entropy. This group is split into two groups on variable V4:
|
||
|
Split 3 Group N Sum(WT) Variation 2 356 .35600E+03 .85404E+03 4 152 .15200E+03 .38393E+03 5 503 .50300E+03 .13580E+04 It can be easily seen that Group-5 has the maximum value of entropy and would be considered first for possible split. All the predictors are evaluated in the same manner as for the earlier splits. None of the predictor variable met the eligibility criterion for splitting this group. Hereafter Group-2 was considered for further split, since it accounted for greater entropy than Group-4. All the predictors were evaluated for splitting Group-2. Predictor V4 (3,4,1,2) was found to be the best predictor at cut-off value (code 3). Hence, Group-2 was partitioned into the following groups:
|
||
|
Split 4 Group N Sum(WT) Variation 4 152 .15200E+03 .38393E+03 6 36 .36000E+02 .91390E+01 7 320 .32000E+03 .80214E+03 Group-7 has the maximum value of entropy and was considered first for a possible split. All the predictors were evaluated for splitting Group-7, Variable V3 was found to be the best splitter at cut-off value = 1. Hence, Group-7 is split into the following groups on this predictor:
|
||
|
Split 5 At this stage, there are 4 candidate groups for possible splits. Group N Sum(WT) Variation 4 152 .15200E+03 .38393E+03 6 36 .36000E+02 .91390E+01 8 120 .12000E+03 .32932E+03 9 200 .20000E+03 .44848E+03 Group-9 has the highest value of entropy, and is, therefore, considered first for possible split. However, none of the predictor variable could meet the eligibility criterion for splitting this group. Thereafter, Group-4, Group-8 and Group-6 were considered one after the other for possible split. However, none of the predictors was able to split any of these groups. |
||
|
The partitioning ends with 5 final groups, Group-6, Group-8, Group-9, Group-4, Group-5. 7.6 of the total variance is explained by the partition. One-way analysis of final groups Source Variation DF
Explained .20923410E+03 4
Error .25288590E+04 1006
Total .27380930E+04 1010
|
||
|
Summary table |
||
|
Final group summary table Frequency distribution (%) of different categories of the dependent variable in each of the nine groups created by the algorithm. Find groups are identified by* Graphical representation of the splitting process summarized at |
||
_______________
N=Number of cases | Group 1 |
C=Predictor codes | N=1011 |
| Split 1 |
| on V2 |
|_____________|
|
_____________________________________RANK______________________________________
C=1 C=2,3
| |
_______|_______ _______|_______
| Group 2 | | Group 3 |
| N=356 | | N=655 |
| Split 3 | | Split 2 |
| on V4 | | on V4 |
|_____________| |_____________|
| |
____________________INSTYP_____________________ ____________________INSTYP_____________________
C=3 C=4,1,2 C=2,3 C=4,1
_______|_______ _______|_______ _______|_______ _______|_______
| Group 6 | | Group 7 | | Group 4 | | Group 5 |
| N=36 | | N=320 | | N=152 | | N=503 |
| Final | | Split 4 | | Final | | Final |
| | | on V3 | | | | |
|_____________| |_____________| |_____________| |_____________|
|
____________FUNDING____________
C=1 C=2
| |
_______|_______ _______|_______
| Group 8 | | Group 9 |
| N=120 | | N=200 |
| Final | | Final |
| | | |
|_____________| |_____________|