### 10(1) Example of Searching for Structure (Means Analysis)

 Research Question : Explore the pattern of relationships between the time – involvement of academic scientists and a set of contextual categorical variables: Rank, institutional settings, scientific field, external funding of research. Methodology : Classification and regression trees: SEARCH module: MEANS Analysis Dataset : ANJU.DAT
\$RUN SEARCH
\$FILES
PRINT = CART.LST
DICTIN = ANJU.DIC
DATAIN = ANJU.DAT
\$SETUP
INCLUDE V9=1-3 AND V10=1,2 AND V12=1,2 AND V15=1,2
SEARCHING OF STRUCTURE
ANALYSIS=MEAN -
DEPVAR=V2 -
MINCASE=10 -
PRINT=(DICT, TRACW,TREE,FINAL)
VARS=(V9) TYPE=M
VARS=(V10,V12,V14,V15) TYPE=F
------------------------------------

Note : Search module does not recognize missing data or invalid codes for the predictor variables. Hence, the filter Include has been used. All options set at default values

##### INTERPRETATION
 IDAMS reports analysis specifications: # Cases:1021 cases were read; 16 cases were rejected for missing data in the dependant variable. Deependent variable=V2 Predictor variables=V9 (monotonic), V10,V12,V14,V15 (Non-monotonic). Group – 1 Sum (wt) = # of cases Mean for the entire sample = 41.851 Var = Variance = 0.32525E+.03 Variation = Sum of squares of the dependant variable = # of cases ´ Var = .32655E+06         Attempt to split Group – 1 Best splits for different predictors Predictor code Cut-point Variance explained V9(1,2,3) 1 0.25600360E+05 V10(1,2) 1 0.71446650E+04 V12(1,2) 1 0.24188630E+05 V14(3,2,4,1) 2 0.33990140E+05 V15(1,2) No eligible split Best split for Group – 1 is on predictor V14 since it has the maximum value of sum of squares Group-1 is split on variable V14 into two groups: Group-2: Codes 3, 2 Group-3: Codes 1, 4 Variance explained = 0.33990140E+05 Split 2 candidate groups Group N Sum(WT) Mean Y Var Y Variation 2 330 .33000E+03 .33533E+02 .24002E+03 .78966E+05 3 675 .67500E+03 .45917E+02 .31691E+03 .21360E+06 Group 3 will be split first, because it has greater variance. Best predictor and its cut-point for splitting Group 3 is selected in the same manner as in Step 1. Predictor V12 (1,2) at cut-point = 1 is found to be the best predictor, since it explains greater variance compared to the best splits of other predictors. Group 3 is split into:      Group-4 Code: 1      Group-5 Code: 2 Split 3 candidate groups Group N Sum(WT) Mean Y Var Y Variation 2 330 .33000E+03 .33533E+02 .24002E+03 .78966E+05 4 297 .29700E+03 .40717E+02 .26587E+03 .78698E+05 5 378 .37800E+03 .50003E+02 .31978E+03 .12056E+06 Group 5 has the maximum variation = 120557 and hence it is selected for further split. All the predictor are again evaluated for their best splits and Predictor V9 (1,2,3) is selected for splitting Group 4 at cut off : 1, 2. Note that predictor V9 is monotonic. Split 4 candidate groups Now the candidate groups for split area Group-2, Group-5, Group-6 and Group-7 Group-2 will be split first since it explains the maximum variation (78966.13) Again all candidate predictors are evaluated for splitting Group-2                V9(1,2,1)              No split meats the splitting criterion                V12(1,2)               No split meats the splitting criterion                V15(1,2)               -to- PredictorV14(3,2) is chosen for splitting Group-2 Group-2 is split into               Group-8              V14(Code: 3)               Group-9              V14(Code: 2) Split 5 candidate groups Now candidate groups for split are: Group-5, Group-6, Group-7, Group-8, Group-9. Group-5 has the maximum sum of square (=786980) and is therefore chosen for split. Best predictor and cut-point value is selected in the same manner as in the earlier steps. Predictor V9(1,2,3) is found to be the best predictor, with cut-point after code:1 Group-5 is split into groups:                Group-10               V9(Code: 1)                Group-11               V9(Code: 2,3) Split 6 candidate groups Now, the candidate groups for further split are: Group-6, Group-7, Group-8, Group-9 and Group-10, Group-11. Group-6 has the maximum sum of squares (62856.201) and is therefore chosen for further split. Name of the predictor variables meats the eligibility criterion for splitting Group-6. Group-9 is now chosen for split, but none of the predictor (60701.727) Variables meat the eligibility criterion for splitting this group. Group-10 is now considered for further split, but none of the predictors variables could split this group. The remaining raps (viz. Group-8, Group-14) were considered for further split, but none of the predictor variables could split any of these groups. Group-7 is chosen for split. This group is split on predictor V14(4,1) into two groups:                Group-12              V14(Code:4)                Group-13              V14(Code:1) Split 7 candidate groups At this stage, there are 5 candidate groups for split Group8, Group-10, Group-11, Group-12, Group-13. Of these Group-11 is chosen for split, since it has the maximum value of sum of squares: (52577.316). Group-11 is now split into two groups on Predictor V14(4,1):                Group-14              V14(Code = 4)                Group-15              V14(Code = 1) Split 8 candidate groups At this stage, there are 6 candidate groups for split: Group-8, Group-10, Group-12, Group-13, Group-14, Group-15 Of these, Group-15 has the maximum value of sum of squares, hence considered for further split: None of the predictor variables is found to be eligible for further split Group-12 is now considered for split, since it has the maximum variables meet the eligibility criterion for splitting this group. Group-10 is now considered for further split, but none of the predictor variables could split this group. The remaining raps (viz. Group-8, Group-14) were considered for further split, but none of the predictor variables could split any of these groups. The partition ends with 8 final groups (Variance explained = 23.9%) Analysis of Variance Table Sum of Square DF (No. of final groups -1) Explained 78042.51 7 Error 248509.91 997 Total 326551.60 1004 The foregoing results of partitioning the data set are summarized in the summary table
_______________
N=Number of cases                                        |   Group 1   |
Y=Dep. var. mean, v262:teaching                         |   N=1005    |
C=Predictor codes                                       |Y=   41.85075|
|   Split 1   |
|   on V14    |
|_____________|
|
_________________________________sv:inst type__________________________________
1
C=3,2                                                           C=4,1
|                                                               |
_______|_______                                                 _______|_______
|   Group 2   |                                                 |   Group 3   |
|    N=330    |                                                 |    N=675    |
|Y=   33.53333|                                                 |Y=   45.91704|
|   Split 4   |                                                 |   Split 2   |
|   on V14    |                                                 |   on V12    |
|_____________|                                                 |_____________|
|                                                               |
_________________sv:inst type__________________                 ________________v335:ext funds_________________
C=3                             C=2                             C=1                             C=2
|                               |                               |                               |
_______|_______                 _______|_______                 _______|_______                 _______|_______
|   Group 8   |                 |   Group 9   |                 |   Group 4   |                 |   Group 5   |
|    N=53     |                 |    N=277    |                 |    N=378    |                 |    N=297    |
|Y=   19.09434|                 |Y=   36.29603|                 |Y=   50.00265|                 |Y=   40.71717|
|    Final    |                 |    Final    |                 |   Split 3   |                 |   Split 5   |
|             |                 |             |                 |    on V9    |                 |    on V9    |
|_____________|                 |_____________|                 |_____________|                 |_____________|
|                               |
___________v204:rank___________ ___________v204:rank___________
C=1,2            C=3             C=1            C=2,3
|               |               |               |
_______|_______ _______|_______ _______|_______ _______|_______
|   Group 6   | |   Group 7   | |  Group 10   | |  Group 11   |
|    N=218    | |    N=160    | |    N=99     | |    N=198    |
|Y=   47.60092| |Y=   53.27500| |Y=   33.87879| |Y=   44.13636|
|    Final    | |   Split 6   | |    Final    | |   Split 7   |
|             | |   on V14    | |             | |   on V14    |
|_____________| |_____________| |_____________| |_____________|
|                               |
|                               |
1
|
_______|_______
N=Number of cases                                       |   Group 7   |
Y=Dep. var. mean, v262:teaching                         |    N=160    |
C=Predictor codes                                       |Y=   53.27500|
|   Split 6   |
|   on V14    |
|_____________|
|
_________________________________sv:inst type__________________________________
C=4                                                             C=1
|                                                               |
_______|_______                                                 _______|_______
|  Group 12   |                                                 |  Group 13   |
|    N=67     |                                                 |    N=93     |
|Y=   48.37313|                                                 |Y=   56.80645|
|    Final    |                                                 |    Final    |
|             |                                                 |             |
|_____________|                                                 |_____________|
1
|
_______|_______
N=Number of cases                                       |  Group 11   |
Y=Dep. var. mean, v262:teaching                         |    N=198    |
C=Predictor codes                                       |Y=   44.13636|
|   Split 7   |
|   on V14    |
|_____________|
|
_________________________________sv:inst type__________________________________
C=4                                                             C=1
|                                                               |
_______|_______                                                 _______|_______
|  Group 14   |                                                 |  Group 15   |
|    N=95     |                                                 |    N=103    |
|Y=   39.53684|                                                 |Y=   48.37864|
|    Final    |                                                 |    Final    |
|             |                                                 |             |
|_____________|                                                 |_____________|

***** Normal termination of SEARCH
***** No more RUN statements in SETUP; step terminated

### 10(2) Searching for Structure : Regression Analysis

 Research Question : How does the pattern of relationship between the time spent by on academic scientist in contextual fac6tors?. Methodology : Classification and regression trees: SEARCH module – Regression analysis Dataset : ANJU.DAT
##### SYNTAX
\$RUN SEARCH
\$FILES
PRINT = SEARCH2.LST
DICTIN = ANJU.DIC
DATAIN = ANJU.DAT
\$SETUP
INCLUDE V9=1-3 AND V15=1,2
SEARCHING OF STRUCTURE: REGRESSIN ANALYSIS
ANALYSIS=REGRESSION -
DEPVAR=V4 -
COVARIATE= V13 -
MINCASE=10 -
IDVAR=V1 -
PRINT=(TRACE, TABLE, TREE)
VARS= (V14, V15) TYPE=F
VARS= V9 TYPE=M
------------------------------------

Note : Search module does not recognize missing data or invalid codes for the predictor variables. Hence, the filter Include has been used. All options set at default values

##### EXTRACT FROM COMPUTER OUTPUT
 Dependent variable: V4           Covariate variable: V13           Identifier variable: V1           After filtering 1039 cases read from the input data file           The number of cases rejected is 23:                     12 for missing data in the dependent variable                     11 for missing data in the covariate The number of processed cases is 1016 Split 1 candidate groups Group N Sum(WT) Mean Y,Z Var Y,Z Slope Variation 1 1016 .10160E+04 .15990E+02 .14602E+03 .30705E+01 .11106E+06 .20010E+01 .38828E+01 Attempt to split group 1 Var= 111055.23 Predictor V14 sv:inst type Rank 2 Type F Codes 3 2 4 1 Best split after code 3 Var expl= .82337250E+04 Predictor V15 :field Rank 2 Type F Codes 2 1 Best split after code 2 Var expl= .90128400E+03 Predictor V9 v204:rank Rank 1 Type M Codes 1 2 3 Best split after code 2 Var expl= .30497740E+04 Best split for group 1 on predictor V14 sv:inst type Rank 1 Var expl= .82337250E+04 Split group 1 on V14 sv:inst type Var expl= .82337250E+04 Into group 2, codes 3 and group 3, codes 2 4 1 Split 2 candidate groups Group N Sum(WT) Mean Y,Z Var Y,Z Slope Variation 2 57 .57000E+02 .31877E+02 .31175E+03 .21646E+01 .16436E+05 .34561E+01 .38954E+01 3 959 .95900E+03 .15046E+02 .12058E+03 .28464E+01 .86385E+05 .19145E+01 .37526E+01 Attempt to split group 3 Var= 86385.484 Predictor V14 sv:inst type Rank 2 Type F Codes 2 4 1 Best split after code 4 Var expl= .14523170E+04 Predictor V15 :field Rank 2 Type F Codes 2 1 No eligible split Predictor V9 v204:rank Rank 1 Type M Codes 1 2 3 Best split after code 1 Var expl= .25520410E+04 Best split for group 3 on predictor V9 v204:rank Rank 1 Var expl= .25520410E+04 Split group 3 on V9 v204:rank Var expl= .25520410E+04 Into group 4, codes 1 and group 5, codes 2 3 Split 3 candidate groups Group N Sum(WT) Mean Y,Z Var Y,Z Slope Variation 2 57 .57000E+02 .31877E+02 .31175E+03 .21646E+01 .16436E+05 .34561E+01 .38954E+01 4 317 .31700E+03 .18356E+02 .10762E+03 .17423E+01 .30268E+05 .26751E+01 .38973E+01 5 642 .64200E+03 .13411E+02 .11906E+03 .32997E+01 .53565E+05 .15389E+01 .32598E+01 Attempt to split group 5 Var= 53565.234 Predictor V14 sv:inst type Rank 2 Type F Codes 2 4 1 Best split after code 4 Var expl= .20114190E+04 Predictor V15 :field Rank 2 Type F Codes 2 1 No eligible split Predictor V9 v204:rank Rank 1 Type M Codes 2 3 No eligible split Best split for group 5 on predictor V14 sv:inst type Rank 1 Var expl= .20114190E+04 Split group 5 on V14 sv:inst type Var expl= .20114190E+04 Into group 6, codes 2 4 and group 7, codes 1 Split 4 candidate groups Group N Sum(WT) Mean Y,Z Var Y,Z Slope Variation 2 57 .57000E+02 .31877E+02 .31175E+03 .21646E+01 .16436E+05 .34561E+01 .38954E+01 4 317 .31700E+03 .18356E+02 .10762E+03 .17423E+01 .30268E+05 .26751E+01 .38973E+01 6 374 .37400E+03 .14513E+02 .12511E+03 .35936E+01 .34194E+05 .14305E+01 .25890E+01 7 268 .26800E+03 .11873E+02 .10697E+03 .31721E+01 .17360E+05 .16903E+01 .41696E+01 Attempt to split group 6 Var= 34194.266 Predictor V14 sv:inst type Rank 2 Type F Codes 2 4 No eligible split Predictor V15 :field Rank 2 Type F Codes 2 1 No eligible split Predictor V9 v204:rank Rank 1 Type M Codes 2 3 No eligible split No eligible split for group 6 1 Split 4 candidate groups Group N Sum(WT) Mean Y,Z Var Y,Z Slope Variation 2 57 .57000E+02 .31877E+02 .31175E+03 .21646E+01 .16436E+05 .34561E+01 .38954E+01 4 317 .31700E+03 .18356E+02 .10762E+03 .17423E+01 .30268E+05 .26751E+01 .38973E+01 7 268 .26800E+03 .11873E+02 .10697E+03 .31721E+01 .17360E+05 .16903E+01 .41696E+01 Attempt to split group 4 Var= 30268.211 Predictor V14 sv:inst type Rank 2 Type F Codes 1 2 4 No eligible split Predictor V15 :field Rank 2 Type F Codes 2 1 Best split after code 2 Var expl= .15842390E+04 Predictor V9 v204:rank Rank 1 Type M Codes 1 No eligible split Best split for group 4 on predictor V15 :field Rank 1 Var expl= .15842390E+04 Split group 4 on V15 :field Var expl= .15842390E+04 Into group 8, codes 2 and group 9, codes 1 Split 5 candidate groups Group N Sum(WT) Mean Y,Z Var Y,Z Slope Variation 2 57 .57000E+02 .31877E+02 .31175E+03 .21646E+01 .16436E+05 .34561E+01 .38954E+01 7 268 .26800E+03 .11873E+02 .10697E+03 .31721E+01 .17360E+05 .16903E+01 .41696E+01 8 139 .13900E+03 .15302E+02 .86966E+02 .19206E+01 .10174E+05 .23022E+01 .35892E+01 9 178 .17800E+03 .20742E+02 .11128E+03 .13002E+01 .18510E+05 .29663E+01 .39650E+01 Attempt to split group 9 Var= 18509.781 Predictor V14 sv:inst type Rank 2 Type F Codes 2 1 4 No eligible split Predictor V15 :field Rank 2 Type F Codes 1 No eligible split Predictor V9 v204:rank Rank 1 Type M Codes 1 No eligible split No eligible split for group 9 1 Split 5 candidate groups Group N Sum(WT) Mean Y,Z Var Y,Z Slope Variation 2 57 .57000E+02 .31877E+02 .31175E+03 .21646E+01 .16436E+05 .34561E+01 .38954E+01 7 268 .26800E+03 .11873E+02 .10697E+03 .31721E+01 .17360E+05 .16903E+01 .41696E+01 8 139 .13900E+03 .15302E+02 .86966E+02 .19206E+01 .10174E+05 .23022E+01 .35892E+01 Attempt to split group 7 Var= 17359.549 Predictor V14 sv:inst type Rank 2 Type F Codes 1 No eligible split Predictor V15 :field Rank 2 Type F Codes 1 2 No eligible split Predictor V9 v204:rank Rank 1 Type M Codes 2 3 No eligible split No eligible split for group 7 1 Split 5 candidate groups Group N Sum(WT) Mean Y,Z Var Y,Z Slope Variation 2 57 .57000E+02 .31877E+02 .31175E+03 .21646E+01 .16436E+05 .34561E+01 .38954E+01 8 139 .13900E+03 .15302E+02 .86966E+02 .19206E+01 .10174E+05 .23022E+01 .35892E+01 Attempt to split group 2 Var= 16436.018 Predictor V14 sv:inst type Rank 2 Type F Codes 3 No eligible split Predictor V15 :field Rank 2 Type F Codes 2 1 Best split after code 2 Var expl= .10602170E+04 Predictor V9 v204:rank Rank 1 Type M Codes 1 2 3 No eligible split Best split for group 2 on predictor V15 :field Rank 1 Var expl= .10602170E+04 Split group 2 on V15 :field Var expl= .10602170E+04 Into group 10, codes 2 and group 11, codes 1 Split 6 candidate groups Group N Sum(WT) Mean Y,Z Var Y,Z Slope Variation 8 139 .13900E+03 .15302E+02 .86966E+02 .19206E+01 .10174E+05 .23022E+01 .35892E+01 10 25 .25000E+02 .27880E+02 .16594E+03 .40679E+00 .39657E+04 .32400E+01 .42733E+01 11 32 .32000E+02 .35000E+02 .41174E+03 .34537E+01 .11410E+05 .36250E+01 .36613E+01 Attempt to split group 11 Var= 11410.132 Predictor V14 sv:inst type Rank 2 Type F Codes 3 No eligible split Predictor V15 :field Rank 2 Type F Codes 1 No eligible split Predictor V9 v204:rank Rank 1 Type M Codes 1 2 No eligible split No eligible split for group 11 1 Split 6 candidate groups Group N Sum(WT) Mean Y,Z Var Y,Z Slope Variation 8 139 .13900E+03 .15302E+02 .86966E+02 .19206E+01 .10174E+05 .23022E+01 .35892E+01 10 25 .25000E+02 .27880E+02 .16594E+03 .40679E+00 .39657E+04 .32400E+01 .42733E+01 Attempt to split group 8 Var= 10174.189 Predictor V14 sv:inst type Rank 2 Type F Codes 4 1 2 No eligible split Predictor V15 :field Rank 2 Type F Codes 2 No eligible split Predictor V9 v204:rank Rank 1 Type M Codes 1 No eligible split No eligible split for group 8 1 Split 6 candidate groups Group N Sum(WT) Mean Y,Z Var Y,Z Slope Variation 10 25 .25000E+02 .27880E+02 .16594E+03 .40679E+00 .39657E+04 .32400E+01 .42733E+01 Attempt to split group 10 Var= 3965.6689 Predictor V14 sv:inst type Rank 2 Type F Codes 3 No eligible split Predictor V15 :field Rank 2 Type F Codes 2 No eligible split Predictor V9 v204:rank Rank 1 Type M Codes 1 2 3 No eligible split No eligible split for group 10 No splits possible The partitioning ends with 6 final groups The variation explained is 13.9% One-way analysis of final groups Source Variation DF Explained .15441640E+05 5 Error .95613590E+05 1010 Total .11105520E+06 1015 Split summary table Group 1 1016 cases Mean(Y)= .15990160E+02 Var(Y)= .14602160E+03 Mean(Z)= .20009840E+01 Var(Z)= .38827580E+01 Slope= .30705440E+01 Intercept= .98460470E+01 Corr= .50069920E+00 Variation= .11105520E+06 Split on V14 sv:inst type Var expl= .82337250E+04 Into group 2, codes 3 and group 3, codes 2 4 1 Group 3 959 cases Mean(Y)= .15045880E+02 Var(Y)= .12057620E+03 Mean(Z)= .19144940E+01 Var(Z)= .37525980E+01 Slope= .28463960E+01 Intercept= .95964720E+01 Corr= .50214670E+00 Variation= .86385480E+05 Split on V9 v204:rank Var expl= .25520410E+04 Into group 4, codes 1 and group 5, codes 2 3 Group 5 642 cases Mean(Y)= .13411210E+02 Var(Y)= .11905680E+03 Mean(Z)= .15389410E+01 Var(Z)= .32597920E+01 Slope= .32996560E+01 Intercept= .83332390E+01 Corr= .54599230E+00 Variation= .53565230E+05 Split on V14 sv:inst type Var expl= .20114190E+04 Into group 6, codes 2 4 and group 7, codes 1 Group 4 317 cases Mean(Y)= .18356470E+02 Var(Y)= .10761620E+03 Mean(Z)= .26750790E+01 Var(Z)= .38972570E+01 Slope= .17423130E+01 Intercept= .13695640E+02 Corr= .33156360E+00 Variation= .30268210E+05 Split on V15 :field Var expl= .15842390E+04 Into group 8, codes 2 and group 9, codes 1 Group 2 57 cases Mean(Y)= .31877190E+02 Var(Y)= .31175250E+03 Mean(Z)= .34561400E+01 Var(Z)= .38953630E+01 Slope= .21646290E+01 Intercept= .24395930E+02 Corr= .24196500E+00 Variation= .16436020E+05 Split on V15 :field Var expl= .10602170E+04 Into group 10, codes 2 and group 11, codes 1 1Final group summary table Group 6 374 cases Mean(Y)= .14513370E+02 Var(Y)= .12510840E+03 Mean(Z)= .14304810E+01 Var(Z)= .25889880E+01 Slope= .35936360E+01 Intercept= .93727400E+01 Corr= .51696E+00 Variation= .34194270E+05 Group 7 268 cases Mean(Y)= .11873130E+02 Var(Y)= .10697260E+03 Mean(Z)= .16902990E+01 Var(Z)= .41696430E+01 Slope= .31720890E+01 Intercept= .65113570E+01 Corr= .62627E+00 Variation= .17359550E+05 Group 8 139 cases Mean(Y)= .15302160E+02 Var(Y)= .86966010E+02 Mean(Z)= .23021580E+01 Var(Z)= .35891980E+01 Slope= .19206370E+01 Intercept= .10880550E+02 Corr= .39018E+00 Variation= .10174190E+05 Group 9 178 cases Mean(Y)= .20741570E+02 Var(Y)= .11127750E+03 Mean(Z)= .29662920E+01 Var(Z)= .39649590E+01 Slope= .13001600E+01 Intercept= .16884920E+02 Corr= .24542E+00 Variation= .18509780E+05 Group 10 25 cases Mean(Y)= .27880000E+02 Var(Y)= .16594330E+03 Mean(Z)= .32400000E+01 Var(Z)= .42733340E+01 Slope= .40678630E+00 Intercept= .26562010E+02 Corr= .65278E-01 Variation= .39656690E+04 Group 11 32 cases Mean(Y)= .35000000E+02 Var(Y)= .41174190E+03 Mean(Z)= .36250000E+01 Var(Z)= .36612900E+01 Slope= .34537440E+01 Intercept= .22480180E+02 Corr= .32568E+00 Variation= .11410130E+05
##### INTERPRETATION
 IDAM reports analysis specifications          No. of cases read from the input data file = 1039          No. of cases rejected for missing data = 23          No. of cases processed = 1016          Dependent variable = V4          Covariate variable= V13 Group 1 (the entire sample) N=Number of cases SUM (WT) = Number of cases Mean Y = 15.990 (Mean of dependent variable) Z = 2.001 (Mean of covariate variable) Variance of Y = Dependent variable Z = Covariate variable Slope = Slope of the dependent variable Y on the covariate Z in the entire sample Variation = Sum of squares of the dependent variable Attempt to split Group 1 The algorithm attempts to make binary splits at different cut-off points of all variables and selects the best cut-off, which results in maximum difference in the slope between the parent group and the descendent group. Predictor Cut-off Variance explained (After code) V14 (3,2,4,1) 3 8233.725 V15 (2,1) 2 901.284 V9 (1,2,3) 2 3049.774 Best split for Group 1 on predictor V14 2 57 .57000E+02 .31877E+02 .31175E+03 .21646E+01 .16436E+05 .34561E+01 .38954E+01 3 959 .95900E+03 .15046E+02 .12058E+03 .28464E+01 .86385E+05 .19145E+01 .37526E+01 Group-3 is selected for further split since its slope is greater than that of Group-2 All the predictors are examined for their best splits. Predictor Cut-off Variance explained (After code) V14 (2,4,1) 4 1452.317 V15 (2,1) No eligible split V9 (1,2,3) 1 2552.041 Group-3 is now spilt on V9 into two groups: Group-4       V9(Code 1) Group-5       V9(Code 2, 3) Split 3 Now the candidate group are: Group N Sum(WT) Mean Y,Z Var Y,Z Slope Variation 2 57 .57000E+02 .31877E+02 .31175E+03 .21646E+01 .16436E+05 34561E+01 .38954E+01 4 317 .31700E+03 .18356E+02 .10762E+03 .17423E+01 .30268E+05 .26751E+01 .38973E+01 5 642 .64200E+03 .13411E+02 .11906E+03 .32997E+01 .53565E+05 .15389E+01 .32598E+01 Group 5 is selected for forth split, since it accounts for the largest value of the slope and variance. All predictors are evaluated for splitting Group-5. Predictor Cut-off Variance explained (After code) V14 (2,4,1) 4 2011.419 V15 (2,1) No eligible split V9 (2,3) 1 No eligible split Group-5 is now split on V14 into two groups:        Group-6V14(2, 4)        Group-7V14(1) Split 4 At this stage, there are 4 candidates groups for further splits. Group N Sum(WT) Mean Y,Z Var Y,Z Slope Variation 2 57 .57000E+02 .31877E+02 .31175E+03 .21646E+01 .16436E+05 .34561E+01 .38954E+01 4 317 .31700E+03 .18356E+02 .10762E+03 .17423E+01 .30268E+05 .26751E+01 .38973E+01 6 374 .37400E+03 .14513E+02 .12511E+03 .35936E+01 .34194E+05 .14305E+01 .25890E+01 7 268 .26800E+03 .11873E+02 .10697E+03 .31721E+01 .17360E+05 .16903E+01 .41696E+01 Group-6 is selected for further split since it has the largest value of the slope (variance explained). No eligible split for Group-6 is found. Group-4 is then considered for possible split. All the predictor variables were evaluated for splitting this group. Predictors V9 and V14 did not meet the eligibility criterion for split. Hence, this group was split on variable V15 (1, 2) into two groups: Group-8V15(Code: 2) Group-9V15(Code: 1) At this stage there are 4 candidate groups for further split Group N Sum(WT) Mean Y,Z Var Y,Z Slope Variation 2 57 .57000E+02 .31877E+02 .31175E+03 .21646E+01 .16436E+05 .34561E+01 .38954E+01 7 268 .26800E+03 .11873E+02 .10697E+03 .31721E+01 .17360E+05 .16903E+01 .41696E+01 8 139 .13900E+03 .15302E+02 .86966E+02 .19206E+01 .10174E+05 .23022E+01 .35892E+01 9 178 .17800E+03 .20742E+02 .11128E+03 .13002E+01 .18510E+05 .29663E+01 .39650E+01 Group-9, Group-7 were sequentially evaluated for further split, but none of the predictor met the eligibility criterion to split any of these groups. Group 2 was considered for further split. Only one variable i.e.V15 met the eligibility criterion, and hence this group was split into two groups: Group-10V15(Code: 2) Group-11V15(Code: 1) At this stage there are three candidate groups for further split: Group-8, Group-10 and Group-11 Group N Sum(WT) Mean Y,Z Var Y,Z Slope Variation 8 139 .13900E+03 .15302E+02 .86966E+02 .19206E+01 .10174E+05 .23022E+01 .35892E+01 10 25 .25000E+02 .27880E+02 .16594E+03 .40679E+00 .39657E+04 .32400E+01 .42733E+01 11 32 .32000E+02 .35000E+02 .41174E+03 .34537E+01 .11410E+05 .36250E+01 .36613E+01 These groups were considered sequentially for further split, but none of the predictor variables met the eligibility criterion for splitting any of these groups. The partition ends with 6 terminal groups Group-10, Group-11, Group-8, Group-9, Group-6 and Group-7. This partition explains 13.9% of the total variance Analysis of variance Table Source Variation DF Explained .15441640E+05 5 Error .95613590E+05 1010 Total .11105520E+06 1015 History of the splitting process. Graphical representation of the splitting process.
### 10.3 Searching for Structure: CHI Analysis

 Research Question : Explore the pattern of relationships between the time – involvement of academic scientists in teaching ( coded as; 1=Low, 2= Averagr, 3= Above averagre, 4= High) and a set of contextual variables: Rank, institutional settings, scientific field, external funding of research. Methodology : Classification and regression trees: SEARCH module – CHI analysis Dataset : PSN.DAT
##### SYNTAX*
\$RUN SEARCH
\$FILES
PRINT = PSN.LST
DICTIN = PSN.DIC
DATAIN = PSN.DAT
\$SETUP
INCLUDE V2=1-3 AND V3=1-2 AND V5 =1,2 AND V6=1-4
SEARCHING OF STRUCTURE
ANALYSIS=CHI -
DEPVAR=V6 CODE= (1-4) -
MINCASE=10 -
PRINT=(TREE,DICT,TRACE,FINAL)
VARS=V2 TYPE=M
VARS=(V3,V4, V5) TYPE=F
------------------------------------

Note : Search module does not recognize missing data or invalid codes for the predictor variables. Hence, the filter Include has been used. All options set at default values

##### EXTRACTION FROM COMPUTER OUTPUT
 After filtering 1011 cases read from the input data file 1 cases contained illegal characters on filter variables and were skipped Split 1 candidate groups Group N Sum(WT) Variation 1 1011 .10110E+04 .27381E+04 Attempt to split group 1 Var= 2738.0928 Predictor V2 RANK Rank 1 Type M Codes 1 2 3 Best split after code 1 Var expl= .87576360E+02 Predictor V3 FUNDING Rank 2 Type F Codes 1 2 Best split after code 1 Var expl= .84713570E+02 Predictor V4 INSTYP Rank 2 Type F Codes 3 4 2 1 Best split after code 3 Var expl= .86131580E+02 Predictor V5 FIELD Rank 2 Type F Codes 2 1 No eligible split Best split for group 1 on predictor V2 RANK Rank 1 Var expl= .87576360E+02 Split group 1 on V2 RANK Var expl= .87576360E+02 Into group 2, codes 1 and group 3, codes 2 3 Split 2 candidate groups Group N Sum(WT) Variation 2 356 .35600E+03 .85404E+03 3 655 .65500E+03 .17965E+04 Attempt to split group 3 Var= 1796.4799 Predictor V2 RANK Rank 1 Type M Codes 2 3 Best split after code 2 Var expl= .23601020E+02 Predictor V3 FUNDING Rank 2 Type F Codes 2 1 Best split after code 2 Var expl= .31537450E+02 Predictor V4 INSTYP Rank 2 Type F Codes 2 3 4 1 Best split after code 3 Var expl= .54556970E+02 Predictor V5 FIELD Rank 2 Type F Codes 2 1 No eligible split Best split for group 3 on predictor V4 INSTYP Rank 1 Var expl= .54556970E+02 Split group 3 on V4 INSTYP Var expl= .54556970E+02 Into group 4, codes 2 3 and group 5, codes 4 1 Split 3 candidate groups Group N Sum(WT) Variation 2 356 .35600E+03 .85404E+03 4 152 .15200E+03 .38393E+03 5 503 .50300E+03 .13580E+04 Attempt to split group 5 Var= 1357.9944 Predictor V2 RANK Rank 1 Type M Codes 2 3 No eligible split Predictor V3 FUNDING Rank 2 Type F Codes 2 1 No eligible split Predictor V4 INSTYP Rank 2 Type F Codes 4 1 No eligible split Predictor V5 FIELD Rank 2 Type F Codes 2 1 No eligible split No eligible split for group 5 Split 3 candidate groups Group N Sum(WT) Variation 2 356 .35600E+03 .85404E+03 4 152 .15200E+03 .38393E+03 Attempt to split group 2 Var= 854.03656 Predictor V2 RANK Rank 1 Type M Codes 1 No eligible split Predictor V3 FUNDING Rank 2 Type F Codes 1 2 Best split after code 1 Var expl= .31131080E+02 Predictor V4 INSTYP Rank 2 Type F Codes 3 4 1 2 Best split after code 3 Var expl= .42759830E+02 Predictor V5 FIELD Rank 2 Type F Codes 2 1 No eligible split Best split for group 2 on predictor V4 INSTYP Rank 1 Var expl= .42759830E+02 Split group 2 on V4 INSTYP Var expl= .42759830E+02 Into group 6, codes 3 and group 7, codes 4 1 2 Split 4 candidate groups Group N Sum(WT) Variation 4 152 .15200E+03 .38393E+03 6 36 .36000E+02 .91390E+01 7 320 .32000E+03 .80214E+03 Attempt to split group 7 Var= 802.13776 Predictor V2 RANK Rank 1 Type M Codes 1 No eligible split Predictor V3 FUNDING Rank 2 Type F Codes 1 2 Best split after code 1 Var expl= .24340890E+02 Predictor V4 INSTYP Rank 2 Type F Codes 4 1 2 No eligible split Predictor V5 FIELD Rank 2 Type F Codes 2 1 No eligible split Best split for group 7 on predictor V3 FUNDING Rank 1 Var expl= .24340890E+02 Split group 7 on V3 FUNDING Var expl= .24340890E+02 Into group 8, codes 1 and group 9, codes 2 Split 5 candidate groups Group N Sum(WT) Variation 4 152 .15200E+03 .38393E+03 6 36 .36000E+02 .91390E+01 8 120 .12000E+03 .32932E+03 9 200 .20000E+03 .44848E+03 Attempt to split group 9 Var= 448.47726 Predictor V2 RANK Rank 1 Type M Codes 1 No eligible split Predictor V3 FUNDING Rank 2 Type F Codes 2 No eligible split Predictor V4 INSTYP Rank 2 Type F Codes 1 4 2 No eligible split Predictor V5 FIELD Rank 2 Type F Codes 1 2 No eligible split No eligible split for group 9 Split 5 candidate groups Group N Sum(WT) Variation 4 152 .15200E+03 .38393E+03 6 36 .36000E+02 .91390E+01 8 120 .12000E+03 .32932E+03 Attempt to split group 4 Var= 383.92853 Predictor V2 RANK Rank 1 Type M Codes 2 3 No eligible split Predictor V3 FUNDING Rank 2 Type F Codes 1 2 No eligible split Predictor V4 INSTYP Rank 2 Type F Codes 3 2 No eligible split Predictor V5 FIELD Rank 2 Type F Codes 1 2 No eligible split No eligible split for group 4 1 Split 5 candidate groups Group N Sum(WT) Variation 6 36 .36000E+02 .91390E+01 8 120 .12000E+03 .32932E+03 Attempt to split group 8 Var= 329.31958 Predictor V2 RANK Rank 1 Type M Codes 1 No eligible split Predictor V3 FUNDING Rank 2 Type F Codes 1 No eligible split Predictor V4 INSTYP Rank 2 Type F Codes 4 1 2 No eligible split Predictor V5 FIELD Rank 2 Type F Codes 2 1 No eligible split No eligible split for group 8 1 Split 5 candidate groups Group N Sum(WT) Variation 6 36 .36000E+02 .91390E+01 Attempt to split group 6 Var= 9.1389990 Predictor V2 RANK Rank 1 Type M Codes 1 No eligible split Predictor V3 FUNDING Rank 2 Type F Codes 1 2 No eligible split Predictor V4 INSTYP Rank 2 Type F Codes 3 No eligible split Predictor V5 FIELD Rank 2 Type F Codes 2 1 No eligible split No eligible split for group 6 No splits possible The partitioning ends with 5 final groups The variation explained is 7.6% One-way analysis of final groups Source Variation DF Explained .20923410E+03 4 Error .25288590E+04 1006 Total .27380930E+04 1010 Split summary table Group 1 1011 cases Variation= .27380930E+04 Split on V2 RANK Var expl= .87576360E+02 Into group 2, codes 1 and group 3, codes 2 3 Group 3 655 cases Variation= .17964800E+04 Split on V4 INSTYP Var expl= .54556970E+02 Into group 4, codes 2 3 and group 5, codes 4 1 Group 2 356 cases Variation= .85403660E+03 Split on V4 INSTYP Var expl= .42759830E+02 Into group 6, codes 3 and group 7, codes 4 1 2 Group 7 320 cases Variation= .80213780E+03 Split on V3 FUNDING Var expl= .24340890E+02 Into group 8, codes 1 and group 9, codes 2 Final group summary table Group 4       152 cases       Variation= .38392850E+03 Group 5       503 cases       Variation= .13579940E+04 Group 6       36 cases         Variation= .91389990E+01 Group 8       120 cases       Variation= .32931960E+03 Group 9       200 cases       Variation= .44847730E+03 Dependent variable percent distribution for each group (*=Final groups) 1 2 3 4* 5* 6* 7 8* 9* Code= 1 36.10 52.25 27.33 42.11 22.86 97.22 47.19 31.67 56.50 Code= 2 19.09 20.51 18.32 22.37 17.10 2.78 22.50 25.00 21.00 Code= 3 21.96 16.85 24.73 26.97 24.06 .00 18.75 23.33 16.00 Code= 4 22.85 10.39 29.62 8.55 35.98 .00 11.56 20.00 6.50
##### INTERPRETATION
 IDAMS reports that 1011 cases were read from the input data file. Split: 1 At this stage, the candidate group is Group-1 comprising all the cases. N = Number of cases Sum (weight) = Number of cases Variation = 2738.0928 This is the entropy of the whole group. Attempt to split Group-1 All the predictors are evaluated, one by one, for their best splits. Predictor Cut-off Entropy After code V2 (1, 2, 3) 1 87.57636 V3 (1, 2) 1 84.71357 V4 (3,4,2,1) 3 86.13158 V5 (2, 1) No eligible split Best split for Group-1 on predictor V2 since it accounts for the maximum entropy. Group-1 is split on V2 into the following groups: Group – 2V2 (Code: 1) Group – 3V2 (Code: 2, 3) Split 2 At this stage there are two candidate groups. Group N Sum(WT) Variation 2 356 .35600E+03 .85404E+03 3 655 .65500E+703 .1965E+04 Group – 3 is considered first for split, since it accounts for the maximum entropy. Best predictors and their cut-off values are as follows: Predictor Cut-off Entropy After code V2 ( 2, 3) 2 23.60102 V3 (2, 1) No eligible split V4 (2, 3,4,1) 3 54.556970 V5 (2, 1) No eligible split Best split for Group-3 is on predictor V4 since it has the maximum value of entropy. This group is split into two groups on variable V4: Group – 4V4 (Code: 2, 3) Group – 5V4 (Code: 4, 1) Split 3 At this stage there are three candidate groups for further splitting: Group N Sum(WT) Variation 2 356 .35600E+03 .85404E+03 4 152 .15200E+03 .38393E+03 5 503 .50300E+03 .13580E+04 It can be easily seen that Group-5 has the maximum value of entropy and would be considered first for possible split. All the predictors are evaluated in the same manner as for the earlier splits. None of the predictor variable met the eligibility criterion for splitting this group. Hereafter Group-2 was considered for further split, since it accounted for greater entropy than Group-4. All the predictors were evaluated for splitting Group-2. Predictor V4 (3,4,1,2) was found to be the best predictor at cut-off value (code 3). Hence, Group-2 was partitioned into the following groups: Group – 6V4 (Code: 3) Group – 7V4 (Code: 4, 1, 2) Split – 4 At this stage there are three candidate groups for possible splits Group N Sum(WT) Variation 4 152 .15200E+03 .38393E+03 6 36 .36000E+02 .91390E+01 7 320 .32000E+03 .80214E+03 Group-7 has the maximum value of entropy and was considered first for a possible split. All the predictors were evaluated for splitting Group-7, Variable V3 was found to be the best splitter at cut-off value = 1. Hence, Group-7 is split into the following groups on this predictor: Group – 8V3(Code:1) Group – 9V3(Code:2) Split 5 At this stage, there are 4 candidate groups for possible splits. Group N Sum(WT) Variation 4 152 .15200E+03 .38393E+03 6 36 .36000E+02 .91390E+01 8 120 .12000E+03 .32932E+03 9 200 .20000E+03 .44848E+03 Group-9 has the highest value of entropy, and is, therefore, considered first for possible split. However, none of the predictor variable could meet the eligibility criterion for splitting this group. Thereafter, Group-4, Group-8 and Group-6 were considered one after the other for possible split. However, none of the predictors was able to split any of these groups. The partitioning ends with 5 final groups, Group-6, Group-8, Group-9, Group-4, Group-5. 7.6 of the total variance is explained by the partition. One-way analysis of final groups Source Variation DF Explained .20923410E+03 4 Error .25288590E+04 1006 Total .27380930E+04 1010 Summary table This is a history of partitioning of the data on various predictors. Final group summary table The size and entropy of the final groups. Frequency distribution (%) of different categories of the dependent variable in each of the nine groups created by the algorithm. Find groups are identified by* Graphical representation of the splitting process summarized at .
_______________
N=Number of cases                                       |   Group 1   |
C=Predictor codes                                       |   N=1011    |
|   Split 1   |
|    on V2    |
|_____________|
|
_____________________________________RANK______________________________________
C=1                                                            C=2,3
|                                                               |
_______|_______                                                 _______|_______
|   Group 2   |                                                 |   Group 3   |
|    N=356    |                                                 |    N=655    |
|   Split 3   |                                                 |   Split 2   |
|    on V4    |                                                 |    on V4    |
|_____________|                                                 |_____________|
|                                                               |
____________________INSTYP_____________________                 ____________________INSTYP_____________________
C=3                           C=4,1,2                          C=2,3                           C=4,1
_______|_______                 _______|_______                 _______|_______                 _______|_______
|   Group 6   |                 |   Group 7   |                 |   Group 4   |                 |   Group 5   |
|    N=36     |                 |    N=320    |                 |    N=152    |                 |    N=503    |
|    Final    |                 |   Split 4   |                 |    Final    |                 |    Final    |
|             |                 |    on V3    |                 |             |                 |             |
|_____________|                 |_____________|                 |_____________|                 |_____________|
|
____________FUNDING____________
C=1             C=2
|               |
_______|_______ _______|_______
|   Group 8   | |   Group 9   |
|    N=120    | |    N=200    |
|    Final    | |    Final    |
|             | |             |
|_____________| |_____________|