10(1) Example of Searching for Structure (Means Analysis)

Research Question

:

Explore the pattern of relationships between the time – involvement of academic scientists and a set of contextual categorical variables: Rank, institutional settings, scientific field, external funding of research.

Methodology

:

Classification and regression trees: SEARCH module: MEANS Analysis

Dataset

:

ANJU.DAT
SYNTAX*
$RUN SEARCH
$FILES
PRINT = CART.LST
DICTIN = ANJU.DIC
DATAIN = ANJU.DAT
$SETUP
INCLUDE V9=1-3 AND V10=1,2 AND V12=1,2 AND V15=1,2
SEARCHING OF STRUCTURE
BADDATA=MD1 -
 ANALYSIS=MEAN -
 DEPVAR=V2 -
 MINCASE=10 -
 PRINT=(DICT, TRACW,TREE,FINAL)
VARS=(V9) TYPE=M
VARS=(V10,V12,V14,V15) TYPE=F
------------------------------------ 

Note : Search module does not recognize missing data or invalid codes for the predictor variables. Hence, the filter Include has been used. All options set at default values

EXTRACT FROM COMPUTER OUTPUT

After filtering 1021 cases read from the input data file

The number of cases rejected is 16:

16 for missing data in the dependent variable

The number of processed cases is 1005

3 cases contained illegal characters and were treated according to BADDATA specification

 

Split 1 candidate groups

     Group       N    Sum(WT)     Mean Y      Var Y      Variation


        1     1005  .10050E+04  .41851E+02  .32525E+03  .32655E+06

Attempt to split group   1   Var= 326551.63

Predictor V9     v204:rank      Rank 1 Type M
Codes 1 2 3

Best split after code 1 Var expl= .25600360E+05

Predictor V10   v217:head?    Rank 2 Type F
Codes 1 2

Best split after code 1 Var expl= .71446650E+04

Predictor V12   v335:ext funds   Rank 2 Type F
Codes 2 1

Best split after code 2 Var expl= .24188630E+05

Predictor V14    sv:inst type    Rank 2 Type F
Codes 3 2 4 1

Best split after code 2 Var expl= .33990140E+05

Predictor V15   :field   Rank 2 Type F
Codes 1 2

No eligible split

Best split for group 1 on predictor V14 sv:inst type Rank 1
  Var expl= .33990140E+05

Split group 1 on V14 sv:inst type Var expl= .33990140E+05
  Into group 2, codes 3 2
   and group 3, codes 4 1

 

Split 2 candidate groups

     Group       N    Sum(WT)     Mean Y      Var Y      Variation


        2      330  .33000E+03  .33533E+02  .24002E+03  .78966E+05
        3      675  .67500E+03  .45917E+02  .31691E+03  .21360E+06
 Attempt to split group 3 Var= 213595.36

Predictor V9 v204:rank Rank 1 Type M
Codes 1 2 3

Best split after code 1 Var expl= .12996760E+05

Predictor V10 v217:head? Rank 2 Type F
Codes 1 2

Best split after code 1 Var expl= .61900730E+04

Predictor V12 v335:ext funds Rank 2 Type F
Codes 2 1

Best split after code 2 Var expl= .14340120E+05

Predictor V14 sv:inst type Rank 2 Type F
Codes 4 1

Best split after code 4 Var expl= .28903780E+04

Predictor V15 :field Rank 2 Type F
Codes 1 2

Best split after code 1 Var expl= .34723500E+04

Best split for group 3 on predictor V12 v335:ext funds Rank 1
Var expl= .14340120E+05

Split group 3 on V12 v335:ext funds Var expl= .14340120E+05
Into group 4, codes 2
and group 5, codes 1

 

Split 3 candidate groups

     Group       N    Sum(WT)     Mean Y      Var Y      Variation


        2      330  .33000E+03  .33533E+02  .24002E+03  .78966E+05
        4      297  .29700E+03  .40717E+02  .26587E+03  .78698E+05
        5      378  .37800E+03  .50003E+02  .31978E+03  .12056E+06

Attempt to split group 5 Var= 120557.00

Predictor V9 v204:rank Rank 1 Type M
Codes 1 2 3

Best split after code 2 Var expl= .29708200E+04

Predictor V10 v217:head? Rank 2 Type F
Codes 1 2

No eligible split

Predictor V12 v335:ext funds Rank 2 Type F
Codes 1

No eligible split

Predictor V14 sv:inst type Rank 2 Type F
Codes 4 1

No eligible split

Predictor V15 :field Rank 2 Type F
Codes 1 2

No eligible split

Best split for group 5 on predictor V9 v204:rank Rank 1
Var expl= .29708200E+04

Split group 5 on V9 v204:rank Var expl= .29708200E+04
Into group 6, codes 1 2
and group 7, codes 3

 

Split 4 candidate groups

     Group       N    Sum(WT)     Mean Y      Var Y      Variation


        2      330  .33000E+03  .33533E+02  .24002E+03  .78966E+05
        4      297  .29700E+03  .40717E+02  .26587E+03  .78698E+05
        6      218  .21800E+03  .47601E+02  .28966E+03  .62856E+05
        7      160  .16000E+03  .53275E+02  .34421E+03  .54730E+05

Attempt to split group 2 Var= 78966.133

Predictor V9 v204:rank Rank 1 Type M
Codes 1 2 3

No eligible split

Predictor V10 v217:head? Rank 2 Type F
Codes 1 2

No eligible split

Predictor V12 v335:ext funds Rank 2 Type F
Codes 2 1

No eligible split

Predictor V14 sv:inst type Rank 2 Type F
Codes 3 2

Best split after code 3 Var expl= .13163880E+05

Predictor V15 :field Rank 2 Type F
Codes 1 2

No eligible split

Best split for group 2 on predictor V14 sv:inst type Rank 1
Var expl= .13163880E+05

Split group 2 on V14 sv:inst type Var expl= .13163880E+05
Into group 8, codes 3
and group 9, codes 2

 

Split 5 candidate groups

     Group       N    Sum(WT)     Mean Y      Var Y      Variation
        4      297  .29700E+03  .40717E+02  .26587E+03  .78698E+05
        6      218  .21800E+03  .47601E+02  .28966E+03  .62856E+05
        7      160  .16000E+03  .53275E+02  .34421E+03  .54730E+05
        8       53  .53000E+02  .19094E+02  .98087E+02  .51005E+04
        9      277  .27700E+03  .36296E+02  .21993E+03  .60702E+05

Attempt to split group 4 Var= 78698.242

Predictor V9 v204:rank Rank 1 Type M
Codes 1 2 3

Best split after code 1 Var expl= .69443780E+04

Predictor V10 v217:head? Rank 2 Type F
Codes 1 2

No eligible split

Predictor V12 v335:ext funds Rank 2 Type F
Codes 2

No eligible split

Predictor V14 sv:inst type Rank 2 Type F
Codes 4 1

No eligible split

Predictor V15 :field Rank 2 Type F
Codes 1 2

Best split after code 1 Var expl= .46585620E+04

Best split for group 4 on predictor V9 v204:rank Rank 1
Var expl= .69443780E+04

Split group 4 on V9 v204:rank Var expl= .69443780E+04
Into group 10, codes 1
and group 11, codes 2 3

 

Split 6 candidate groups

     Group       N    Sum(WT)     Mean Y      Var Y      Variation


        6      218  .21800E+03  .47601E+02  .28966E+03  .62856E+05
        7      160  .16000E+03  .53275E+02  .34421E+03  .54730E+05
        8       53  .53000E+02  .19094E+02  .98087E+02  .51005E+04
        9      277  .27700E+03  .36296E+02  .21993E+03  .60702E+05
       10       99  .99000E+02  .33879E+02  .19568E+03  .19177E+05
       11      198  .19800E+03  .44136E+02  .26689E+03  .52577E+05
 

Attempt to split group 6 Var= 62856.281

Predictor V9 v204:rank Rank 1 Type M
Codes 1 2

No eligible split

Predictor V10 v217:head? Rank 2 Type F
Codes 1 2

No eligible split

Predictor V12 v335:ext funds Rank 2 Type F
Codes 1

No eligible split

Predictor V14 sv:inst type Rank 2 Type F
Codes 1 4

No eligible split

Predictor V15 :field Rank 2 Type F
Codes 1 2

No eligible split

No eligible split for group 6

Split 6 candidate groups

     Group       N    Sum(WT)     Mean Y      Var Y      Variation

        7      160  .16000E+03  .53275E+02  .34421E+03  .54730E+05
        8       53  .53000E+02  .19094E+02  .98087E+02  .51005E+04
        9      277  .27700E+03  .36296E+02  .21993E+03  .60702E+05
       10       99  .99000E+02  .33879E+02  .19568E+03  .19177E+05
       11      198  .19800E+03  .44136E+02  .26689E+03  .52577E+05

Attempt to split group 9 Var= 60701.727

Predictor V9 v204:rank Rank 1 Type M
Codes 1 2 3

No eligible split

Predictor V10 v217:head? Rank 2 Type F
Codes 1 2

No eligible split

Predictor V12 v335:ext funds Rank 2 Type F
Codes 2 1

No eligible split

Predictor V14 sv:inst type Rank 2 Type F
Codes 2

No eligible split

Predictor V15 :field Rank 2 Type F
Codes 2 1

No eligible split

No eligible split for group 9

1

Split 6 candidate groups

     Group       N    Sum(WT)     Mean Y      Var Y      Variation

        7      160  .16000E+03  .53275E+02  .34421E+03  .54730E+05
        8       53  .53000E+02  .19094E+02  .98087E+02  .51005E+04
       10       99  .99000E+02  .33879E+02  .19568E+03  .19177E+05
       11      198  .19800E+03  .44136E+02  .26689E+03  .52577E+05

Attempt to split group 7 Var= 54729.898

Predictor V9 v204:rank Rank 1 Type M
Codes 3

No eligible split

Predictor V10 v217:head? Rank 2 Type F
Codes 2 1

No eligible split

Predictor V12 v335:ext funds Rank 2 Type F
Codes 1

No eligible split

Predictor V14 sv:inst type Rank 2 Type F
Codes 4 1

Best split after code 4 Var expl= .27697110E+04
Var expl= .27697110E+04

Split group 7 on V14 sv:inst type Var expl= .27697110E+04
Into group 12, codes 4

Predictor V15 :field Rank 2 Type F
Codes 2 1

No eligible split

Best split for group 7 on predictor V14 sv:inst type Rank 1
and group 13, codes 1

 

Split 7 candidate groups

     Group       N    Sum(WT)     Mean Y      Var Y      Variation

        8       53  .53000E+02  .19094E+02  .98087E+02  .51005E+04
       10       99  .99000E+02  .33879E+02  .19568E+03  .19177E+05
       11      198  .19800E+03  .44136E+02  .26689E+03  .52577E+05
       12       67  .67000E+02  .48373E+02  .39621E+03  .26150E+05
       13       93  .93000E+02  .56806E+02  .28055E+03  .25811E+05

Attempt to split group 11 Var= 52577.316

Predictor V9 v204:rank Rank 1 Type M
Codes 2 3

No eligible split

Predictor V10 v217:head? Rank 2 Type F
Codes 1 2

No eligible split

Predictor V12 v335:ext funds Rank 2 Type F
Codes 2

No eligible split

Predictor V14 sv:inst type Rank 2 Type F
Codes 4 1

Best split after code 4 Var expl= .38634620E+04

Predictor V15 :field Rank 2 Type F
Codes 1 2

No eligible split

Best split for group 11 on predictor V14 sv:inst type Rank 1
Var expl= .38634620E+04

Split group 11 on V14 sv:inst type Var expl= .38634620E+04
Into group 14, codes 4
and group 15, codes 1

1

 

Split 8 candidate groups

      Group       N    Sum(WT)     Mean Y      Var Y      Variation


        8       53  .53000E+02  .19094E+02  .98087E+02  .51005E+04
       10       99  .99000E+02  .33879E+02  .19568E+03  .19177E+05
       12       67  .67000E+02  .48373E+02  .39621E+03  .26150E+05
       13       93  .93000E+02  .56806E+02  .28055E+03  .25811E+05
       14       95  .95000E+02  .39537E+02  .19251E+03  .18096E+05
       15      103  .10300E+03  .48379E+02  .30018E+03  .30618E+05

Attempt to split group 15 Var= 30618.232

Predictor V9 v204:rank Rank 1 Type M
Codes 2 3

No eligible split

Predictor V10 v217:head? Rank 2 Type F
Codes 2 1

No eligible split

Predictor V12 v335:ext funds Rank 2 Type F
Codes 2

No eligible split

Predictor V14 sv:inst type Rank 2 Type F
Codes 1

No eligible split

Predictor V15 :field Rank 2 Type F
Codes 1 2

No eligible split

No eligible split for group 15

Split 8 candidate groups

     Group       N    Sum(WT)     Mean Y      Var Y      Variation


        8       53  .53000E+02  .19094E+02  .98087E+02  .51005E+04
       10       99  .99000E+02  .33879E+02  .19568E+03  .19177E+05
       12       67  .67000E+02  .48373E+02  .39621E+03  .26150E+05
       13       93  .93000E+02  .56806E+02  .28055E+03  .25811E+05
       14       95  .95000E+02  .39537E+02  .19251E+03  .18096E+05

Attempt to split group 12 Var= 26149.672

Predictor V9 v204:rank Rank 1 Type M
Codes 3

No eligible split

Predictor V10 v217:head? Rank 2 Type F
Codes 2

No eligible split

Predictor V12 v335:ext funds Rank 2 Type F
Codes 1

No eligible split

Predictor V14 sv:inst type Rank 2 Type F
Codes 4

No eligible split

Predictor V15 :field Rank 2 Type F
Codes 2 1

No eligible split

No eligible split for group 12

Split 8 candidate groups

     Group       N    Sum(WT)     Mean Y      Var Y      Variation


        8       53  .53000E+02  .19094E+02  .98087E+02  .51005E+04
       10       99  .99000E+02  .33879E+02  .19568E+03  .19177E+05
       13       93  .93000E+02  .56806E+02  .28055E+03  .25811E+05
       14       95  .95000E+02  .39537E+02  .19251E+03  .18096E+05

Attempt to split group 13 Var= 25810.516

Predictor V9 v204:rank Rank 1 Type M
Codes 3

No eligible split

Predictor V10 v217:head? Rank 2 Type F
Codes 1 2

No eligible split

Predictor V12 v335:ext funds Rank 2 Type F
Codes 1

No eligible split

Predictor V14 sv:inst type Rank 2 Type F
Codes 1

No eligible split

Predictor V15 :field Rank 2 Type F
Codes 1 2

No eligible split

No eligible split for group 13

1

Split 8 candidate groups

     Group       N    Sum(WT)     Mean Y      Var Y      Variation

        8       53  .53000E+02  .19094E+02  .98087E+02  .51005E+04
       10       99  .99000E+02  .33879E+02  .19568E+03  .19177E+05
       14       95  .95000E+02  .39537E+02  .19251E+03  .18096E+05

Attempt to split group 10 Var= 19176.545

Predictor V9 v204:rank Rank 1 Type M
Codes 1

No eligible split

Predictor V10 v217:head? Rank 2 Type F
Codes 1 2

No eligible split

Predictor V12 v335:ext funds Rank 2 Type F
Codes 2

No eligible split

Predictor V14 sv:inst type Rank 2 Type F
Codes 1 4

No eligible split

Predictor V15 :field Rank 2 Type F
Codes 1 2

No eligible split

No eligible split for group 10

1

Split 8 candidate groups

      Group       N    Sum(WT)     Mean Y      Var Y      Variation


        8       53  .53000E+02  .19094E+02  .98087E+02  .51005E+04
       14       95  .95000E+02  .39537E+02  .19251E+03  .18096E+05

Attempt to split group 14 Var= 18095.621

Predictor V9 v204:rank Rank 1 Type M
Codes 2 3

No eligible split

Predictor V10 v217:head? Rank 2 Type F
Codes 1 2

No eligible split

Predictor V12 v335:ext funds Rank 2 Type F
Codes 2

No eligible split

Predictor V14 sv:inst type Rank 2 Type F
Codes 4

No eligible split

Predictor V15 :field Rank 2 Type F
Codes 1 2

No eligible split

No eligible split for group 14

1

Split 8 candidate groups

     Group       N    Sum(WT)     Mean Y      Var Y      Variation


        8       53  .53000E+02  .19094E+02  .98087E+02  .51005E+04

Attempt to split group 8 Var= 5100.5283

Predictor V9 v204:rank Rank 1 Type M
Codes 1 2 3

No eligible split

Predictor V10 v217:head? Rank 2 Type F
Codes 1 2

No eligible split

Predictor V12 v335:ext funds Rank 2 Type F
Codes 2 1

No eligible split

Predictor V14 sv:inst type Rank 2 Type F
Codes 3

No eligible split

Predictor V15 :field Rank 2 Type F
Codes 1 2

No eligible split

No eligible split for group 8

No splits possible

 

The partitioning ends with 8 final groups

The variation explained is 23.9%

One-way analysis of final groups

         Source         Variation          DF

         Explained  .78042510E+05           7
         Error      .24850910E+06         997
         Total      .32655160E+06        1004
 

Split summary table

Group 1 1005 cases
Mean(Y)= .41850750E+02 Var(Y)= .32525060E+03 Variation= .32655160E+06
Split on V14 sv:inst type                          Var expl= .33990140E+05
Into group 2, codes 3 2
and group 3, codes 4 1

Group 3 675 cases
Mean(Y)= .45917040E+02 Var(Y)= .31690700E+03 Variation= .21359540E+06
Split on V12 v335:ext funds                          Var expl= .14340120E+05
Into group 4, codes 2
and group 5, codes 1

Group 5 378 cases
Mean(Y)= .50002650E+02 Var(Y)= .31977980E+03 Variation= .12055700E+06
Split on V9 v204:rank                          Var expl= .29708200E+04
Into group 6, codes 1 2
and group 7, codes 3

Group 2 330 cases
Mean(Y)= .33533330E+02 Var(Y)= .24001860E+03 Variation= .78966130E+05
Split on V14 sv:inst type                          Var expl= .13163880E+05
Into group 8, codes 3
and group 9, codes 2

Group 4 297 cases
Mean(Y)= .40717170E+02 Var(Y)= .26587240E+03 Variation= .78698240E+05
Split on V9 v204:rank                          Var expl= .69443780E+04
Into group 10, codes 1
and group 11, codes 2 3

Group 7 160 cases
Mean(Y)= .53275000E+02 Var(Y)= .34421320E+03 Variation= .54729900E+05
Split on V14 sv:inst type                          Var expl= .27697110E+04
Into group 12, codes 4
and group 13, codes 1

Group 11 198 cases
Mean(Y)= .44136360E+02 Var(Y)= .26689000E+03 Variation= .52577320E+05
Split on V14 sv:inst type                          Var expl= .38634620E+04
Into group 14, codes 4
and group 15, codes 1

 

Final group summary table

Group 6 218 cases
     Mean(Y)= .47600920E+02 Var(Y)= .28966030E+03 Variation= .62856280E+05

Group 8 53 cases
     Mean(Y)= .19094340E+02 Var(Y)= .98087080E+02 Variation= .51005280E+04

Group 9 277 cases
     Mean(Y)= .36296030E+02 Var(Y)= .21993380E+03 Variation= .60701730E+05

Group 10 99 cases
     Mean(Y)= .33878790E+02 Var(Y)= .19567900E+03 Variation= .19176540E+05

Group 12 67 cases
     Mean(Y)= .48373130E+02 Var(Y)= .39620720E+03 Variation= .26149670E+05

Group 13 93 cases
     Mean(Y)= .56806450E+02 Var(Y)= .28054910E+03 Variation= .25810520E+05

Group 14 95 cases
     Mean(Y)= .39536840E+02 Var(Y)= .19250660E+03 Variation= .18095620E+05

Group 15 103 cases
      Mean(Y)= .48378640E+02 Var(Y)= .30017870E+03 Variation= .30618230E+05

INTERPRETATION

IDAMS reports analysis specifications:
# Cases:1021 cases were read; 16 cases were rejected for missing data in the dependant variable.
Deependent variable=V2 Predictor variables=V9 (monotonic), V10,V12,V14,V15 (Non-monotonic).

 

Group – 1

Sum (wt) = # of cases
Mean for the entire sample = 41.851
Var = Variance = 0.32525E+.03
Variation = Sum of squares of the dependant variable = # of cases ´ Var = .32655E+06

        Attempt to split Group – 1
Best splits for different predictors

Predictor code      Cut-point   Variance explained
V9(1,2,3)              1          0.25600360E+05
V10(1,2)               1          0.71446650E+04
V12(1,2)               1          0.24188630E+05
V14(3,2,4,1)           2          0.33990140E+05
V15(1,2)			No eligible split

Best split for Group – 1 is on predictor V14 since it has the maximum value of sum of squares

Group-1 is split on variable V14 into two groups:

Group-2: Codes 3, 2
Group-3: Codes 1, 4
Variance explained = 0.33990140E+05

 

Split 2 candidate groups

    Group       N     Sum(WT)     Mean Y      Var Y     Variation
      2        330  .33000E+03  .33533E+02  .24002E+03  .78966E+05
      3        675  .67500E+03  .45917E+02  .31691E+03  .21360E+06

Group 3 will be split first, because it has greater variance.

Best predictor and its cut-point for splitting Group 3 is selected in the same manner as in Step 1.

Predictor V12 (1,2) at cut-point = 1 is found to be the best predictor, since it explains greater variance compared to the best splits of other predictors.

Group 3 is split into:
     Group-4 Code: 1
     Group-5 Code: 2

 

Split 3 candidate groups

     Group      N     Sum(WT)     Mean Y      Var Y      Variation

        2      330  .33000E+03  .33533E+02  .24002E+03  .78966E+05
        4      297  .29700E+03  .40717E+02  .26587E+03  .78698E+05
        5      378  .37800E+03  .50003E+02  .31978E+03  .12056E+06

Group 5 has the maximum variation = 120557 and hence it is selected for further split.

All the predictor are again evaluated for their best splits and Predictor V9 (1,2,3) is selected for splitting Group 4 at cut off : 1, 2. Note that predictor V9 is monotonic.

 

Split 4 candidate groups
Now the candidate groups for split area

Group-2, Group-5, Group-6 and Group-7

Group-2 will be split first since it explains the maximum variation (78966.13)

Again all candidate predictors are evaluated for splitting Group-2
               V9(1,2,1)              No split meats the splitting criterion
               V12(1,2)               No split meats the splitting criterion
               V15(1,2)               -to-

PredictorV14(3,2) is chosen for splitting Group-2

Group-2 is split into
              Group-8              V14(Code: 3)
              Group-9              V14(Code: 2)

 

Split 5 candidate groups
Now candidate groups for split are:
Group-5, Group-6, Group-7, Group-8, Group-9.
Group-5 has the maximum sum of square (=786980) and is therefore chosen for split.

Best predictor and cut-point value is selected in the same manner as in the earlier steps.
Predictor V9(1,2,3) is found to be the best predictor, with cut-point after code:1

Group-5 is split into groups:
               Group-10               V9(Code: 1)
               Group-11               V9(Code: 2,3)

 

Split 6 candidate groups
Now, the candidate groups for further split are:
Group-6, Group-7, Group-8, Group-9 and Group-10, Group-11.
Group-6 has the maximum sum of squares (62856.201) and is therefore chosen for further split.

Name of the predictor variables meats the eligibility criterion for splitting Group-6.

Group-9 is now chosen for split, but none of the predictor (60701.727)
Variables meat the eligibility criterion for splitting this group.

Group-10 is now considered for further split, but none of the predictors variables could split this group.

The remaining raps (viz. Group-8, Group-14) were considered for further split, but none of the predictor variables could split any of these groups.

Group-7 is chosen for split. This group is split on predictor V14(4,1) into two groups:
               Group-12              V14(Code:4)
               Group-13              V14(Code:1)

 

Split 7 candidate groups
At this stage, there are 5 candidate groups for split
Group8, Group-10, Group-11, Group-12, Group-13.
Of these Group-11 is chosen for split, since it has the maximum value of sum of squares: (52577.316).

Group-11 is now split into two groups on Predictor V14(4,1):
               Group-14              V14(Code = 4)
               Group-15              V14(Code = 1)

  Split 8 candidate groups
At this stage, there are 6 candidate groups for split:
Group-8, Group-10, Group-12, Group-13, Group-14, Group-15

Of these, Group-15 has the maximum value of sum of squares, hence considered for further split:

None of the predictor variables is found to be eligible for further split Group-12 is now considered for split, since it has the maximum variables meet the eligibility criterion for splitting this group.

Group-10 is now considered for further split, but none of the predictor variables could split this group.

The remaining raps (viz. Group-8, Group-14) were considered for further split, but none of the predictor variables could split any of these groups.

The partition ends with 8 final groups (Variance explained = 23.9%)
Analysis of Variance Table

           Sum of Square     DF (No. of final groups -1)
Explained     78042.51       7
Error        248509.91       997
Total        326551.60       1004
 

The foregoing results of partitioning the data set are summarized in the summary table

	                                                     _______________
N=Number of cases                                        |   Group 1   |
 Y=Dep. var. mean, v262:teaching                         |   N=1005    |
 C=Predictor codes                                       |Y=   41.85075|
                                                         |   Split 1   |
                                                         |   on V14    |
                                                         |_____________|
                                                                |
                         _________________________________sv:inst type__________________________________
1
                              C=3,2                                                           C=4,1
                                |                                                               |
                         _______|_______                                                 _______|_______
                         |   Group 2   |                                                 |   Group 3   |
                         |    N=330    |                                                 |    N=675    |
                         |Y=   33.53333|                                                 |Y=   45.91704|
                         |   Split 4   |                                                 |   Split 2   |
                         |   on V14    |                                                 |   on V12    |
                         |_____________|                                                 |_____________|
                                |                                                               |
         _________________sv:inst type__________________                 ________________v335:ext funds_________________
               C=3                             C=2                             C=1                             C=2
                |                               |                               |                               |
         _______|_______                 _______|_______                 _______|_______                 _______|_______
         |   Group 8   |                 |   Group 9   |                 |   Group 4   |                 |   Group 5   |
         |    N=53     |                 |    N=277    |                 |    N=378    |                 |    N=297    |
         |Y=   19.09434|                 |Y=   36.29603|                 |Y=   50.00265|                 |Y=   40.71717|
         |    Final    |                 |    Final    |                 |   Split 3   |                 |   Split 5   |
         |             |                 |             |                 |    on V9    |                 |    on V9    |
         |_____________|                 |_____________|                 |_____________|                 |_____________|
                                                                                |                               |
                                                                 ___________v204:rank___________ ___________v204:rank___________
                                                                      C=1,2            C=3             C=1            C=2,3
                                                                        |               |               |               |
                                                                 _______|_______ _______|_______ _______|_______ _______|_______
                                                                 |   Group 6   | |   Group 7   | |  Group 10   | |  Group 11   |
                                                                 |    N=218    | |    N=160    | |    N=99     | |    N=198    |
                                                                 |Y=   47.60092| |Y=   53.27500| |Y=   33.87879| |Y=   44.13636|
                                                                 |    Final    | |   Split 6   | |    Final    | |   Split 7   |
                                                                 |             | |   on V14    | |             | |   on V14    |
                                                                 |_____________| |_____________| |_____________| |_____________|
                                                                                        |                               |
                                                                                        |                               |
1
                                                                |
                                                         _______|_______
 N=Number of cases                                       |   Group 7   |
 Y=Dep. var. mean, v262:teaching                         |    N=160    |
 C=Predictor codes                                       |Y=   53.27500|
                                                         |   Split 6   |
                                                         |   on V14    |
                                                         |_____________|
                                                                |
                         _________________________________sv:inst type__________________________________
                               C=4                                                             C=1
                                |                                                               |
                         _______|_______                                                 _______|_______
                         |  Group 12   |                                                 |  Group 13   |
                         |    N=67     |                                                 |    N=93     |
                         |Y=   48.37313|                                                 |Y=   56.80645|
                         |    Final    |                                                 |    Final    |
                         |             |                                                 |             |
                         |_____________|                                                 |_____________|
1
                                                                |
                                                         _______|_______
 N=Number of cases                                       |  Group 11   |
 Y=Dep. var. mean, v262:teaching                         |    N=198    |
 C=Predictor codes                                       |Y=   44.13636|
                                                         |   Split 7   |
                                                         |   on V14    |
                                                         |_____________|
                                                                |
                         _________________________________sv:inst type__________________________________
                               C=4                                                             C=1
                                |                                                               |
                         _______|_______                                                 _______|_______
                         |  Group 14   |                                                 |  Group 15   |
                         |    N=95     |                                                 |    N=103    |
                         |Y=   39.53684|                                                 |Y=   48.37864|
                         |    Final    |                                                 |    Final    |
                         |             |                                                 |             |
                         |_____________|                                                 |_____________|














N=Number of cases                                        |   Group 1   |
 Y=Dep. var. mean, v262:teaching                         |   N=1005    |
 C=Predictor codes                                       |Y=   41.85075|
                                                         |   Split 1   |
                                                         |   on V14    |
                                                         |_____________|
                                                                |
                         _________________________________sv:inst type__________________________________
1
                              C=3,2                                                           C=4,1
                                |                                                               |
                         _______|_______                                                 _______|_______
                         |   Group 2   |                                                 |   Group 3   |
                         |    N=330    |                                                 |    N=675    |
                         |Y=   33.53333|                                                 |Y=   45.91704|
                         |   Split 4   |                                                 |   Split 2   |
                         |   on V14    |                                                 |   on V12    |
                         |_____________|                                                 |_____________|
                                |                                                               |
         _________________sv:inst type__________________                 ________________v335:ext funds_________________
               C=3                             C=2                             C=1                             C=2
                |                               |                               |                               |
         _______|_______                 _______|_______                 _______|_______                 _______|_______
         |   Group 8   |                 |   Group 9   |                 |   Group 4   |                 |   Group 5   |
         |    N=53     |                 |    N=277    |                 |    N=378    |                 |    N=297    |
         |Y=   19.09434|                 |Y=   36.29603|                 |Y=   50.00265|                 |Y=   40.71717|
         |    Final    |                 |    Final    |                 |   Split 3   |                 |   Split 5   |
         |             |                 |             |                 |    on V9    |                 |    on V9    |
         |_____________|                 |_____________|                 |_____________|                 |_____________|
                                                                                |                               |
                                                                 ___________v204:rank___________ ___________v204:rank___________
                                                                      C=1,2            C=3             C=1            C=2,3
                                                                        |               |               |               |
                                                                 _______|_______ _______|_______ _______|_______ _______|_______
                                                                 |   Group 6   | |   Group 7   | |  Group 10   | |  Group 11   |
                                                                 |    N=218    | |    N=160    | |    N=99     | |    N=198    |
                                                                 |Y=   47.60092| |Y=   53.27500| |Y=   33.87879| |Y=   44.13636|
                                                                 |    Final    | |   Split 6   | |    Final    | |   Split 7   |
                                                                 |             | |   on V14    | |             | |   on V14    |
                                                                 |_____________| |_____________| |_____________| |_____________|
                                                                                        |                               |
                                                                                        |                               |
1
                                                                |
                                                         _______|_______
 N=Number of cases                                       |   Group 7   |
 Y=Dep. var. mean, v262:teaching                         |    N=160    |
 C=Predictor codes                                       |Y=   53.27500|
                                                         |   Split 6   |
                                                         |   on V14    |
                                                         |_____________|
                                                                |
                         _________________________________sv:inst type__________________________________
                               C=4                                                             C=1
                                |                                                               |
                         _______|_______                                                 _______|_______
                         |  Group 12   |                                                 |  Group 13   |
                         |    N=67     |                                                 |    N=93     |
                         |Y=   48.37313|                                                 |Y=   56.80645|
                         |    Final    |                                                 |    Final    |
                         |             |                                                 |             |
                         |_____________|                                                 |_____________|
1
                                                                |
                                                         _______|_______
 N=Number of cases                                       |  Group 11   |
 Y=Dep. var. mean, v262:teaching                         |    N=198    |
 C=Predictor codes                                       |Y=   44.13636|
                                                         |   Split 7   |
                                                         |   on V14    |
                                                         |_____________|
                                                                |
                         _________________________________sv:inst type__________________________________
                               C=4                                                             C=1
                                |                                                               |
                         _______|_______                                                 _______|_______
                         |  Group 14   |                                                 |  Group 15   |
                         |    N=95     |                                                 |    N=103    |
                         |Y=   39.53684|                                                 |Y=   48.37864|
                         |    Final    |                                                 |    Final    |
                         |             |                                                 |             |
                         |_____________|                                                 |_____________|


 ***** Normal termination of SEARCH
 ***** No more RUN statements in SETUP; step terminated

10(2) Searching for Structure : Regression Analysis

Research Question

:

How does the pattern of relationship between the time spent by on academic scientist in contextual fac6tors?.

Methodology

:

Classification and regression trees: SEARCH module – Regression analysis

Dataset

:

ANJU.DAT
SYNTAX
$RUN SEARCH
$FILES
PRINT = SEARCH2.LST
DICTIN = ANJU.DIC
DATAIN = ANJU.DAT
$SETUP
INCLUDE V9=1-3 AND V15=1,2
SEARCHING OF STRUCTURE: REGRESSIN ANALYSIS
BADDATA=MD1 -
 ANALYSIS=REGRESSION -
 DEPVAR=V4 -
 COVARIATE= V13 -
 MINCASE=10 -
 IDVAR=V1 -
 PRINT=(TRACE, TABLE, TREE)
 VARS= (V14, V15) TYPE=F
 VARS= V9 TYPE=M
------------------------------------ 

Note : Search module does not recognize missing data or invalid codes for the predictor variables. Hence, the filter Include has been used. All options set at default values

EXTRACT FROM COMPUTER OUTPUT

Dependent variable: V4
          Covariate variable: V13
          Identifier variable: V1
          After filtering 1039 cases read from the input data file
          The number of cases rejected is 23:
                    12 for missing data in the dependent variable
                    11 for missing data in the covariate
The number of processed cases is 1016

 

Split 1 candidate groups

     Group       N    Sum(WT)     Mean Y,Z    Var Y,Z      Slope     Variation

        1     1016  .10160E+04  .15990E+02  .14602E+03  .30705E+01  .11106E+06
                                              .20010E+01  .38828E+01

Attempt to split group 1 Var= 111055.23

Predictor V14 sv:inst type Rank 2 Type F
Codes 3 2 4 1

Best split after code 3 Var expl= .82337250E+04

Predictor V15 :field Rank 2 Type F
Codes 2 1

Best split after code 2 Var expl= .90128400E+03

Predictor V9 v204:rank Rank 1 Type M
Codes 1 2 3

Best split after code 2 Var expl= .30497740E+04

Best split for group 1 on predictor V14 sv:inst type Rank 1
Var expl= .82337250E+04

Split group 1 on V14 sv:inst type Var expl= .82337250E+04
Into group 2, codes 3
and group 3, codes 2 4 1

 

Split 2 candidate groups

      Group       N    Sum(WT)     Mean Y,Z      Var Y,Z         Slope           Variation


        2           57   .57000E+02  .31877E+02  .31175E+03  .21646E+01  .16436E+05
                                                 .34561E+01  .38954E+01
        3          959 .95900E+03   .15046E+02  .12058E+03  .28464E+01  .86385E+05
                                                 .19145E+01  .37526E+01

Attempt to split group 3 Var= 86385.484

Predictor V14 sv:inst type Rank 2 Type F
Codes 2 4 1

Best split after code 4 Var expl= .14523170E+04

Predictor V15 :field Rank 2 Type F
Codes 2 1

No eligible split

Predictor V9 v204:rank Rank 1 Type M
Codes 1 2 3

Best split after code 1 Var expl= .25520410E+04

Best split for group 3 on predictor V9 v204:rank Rank 1
Var expl= .25520410E+04

Split group 3 on V9 v204:rank Var expl= .25520410E+04
Into group 4, codes 1
and group 5, codes 2 3

 

Split 3 candidate groups

     Group       N    Sum(WT)        Mean Y,Z        Var Y,Z      Slope     Variation


        2             57   .57000E+02  .31877E+02  .31175E+03  .21646E+01  .16436E+05
                                                   .34561E+01  .38954E+01
        4           317   .31700E+03  .18356E+02  .10762E+03  .17423E+01  .30268E+05
                      .26751E+01  .38973E+01
        5           642   .64200E+03  .13411E+02  .11906E+03  .32997E+01  .53565E+05
                                                   .15389E+01  .32598E+01

Attempt to split group 5 Var= 53565.234

Predictor V14 sv:inst type Rank 2 Type F
Codes 2 4 1

Best split after code 4 Var expl= .20114190E+04

Predictor V15 :field Rank 2 Type F
Codes 2 1

No eligible split

Predictor V9 v204:rank Rank 1 Type M
Codes 2 3

No eligible split

Best split for group 5 on predictor V14 sv:inst type Rank 1
Var expl= .20114190E+04

Split group 5 on V14 sv:inst type Var expl= .20114190E+04
Into group 6, codes 2 4
and group 7, codes 1

 

Split 4 candidate groups

     Group       N    Sum(WT)        Mean Y,Z        Var Y,Z      Slope      Variation


        2           57   .57000E+02   .31877E+02  .31175E+03  .21646E+01  .16436E+05
                                                  .34561E+01  .38954E+01
        4         317    .31700E+03  .18356E+02  .10762E+03  .17423E+01  .30268E+05
                                                  .26751E+01  .38973E+01
        6         374   .37400E+03   .14513E+02  .12511E+03  .35936E+01  .34194E+05
                                                  .14305E+01  .25890E+01
        7         268  .26800E+03    .11873E+02  .10697E+03  .31721E+01  .17360E+05
                                                  .16903E+01  .41696E+01

Attempt to split group 6 Var= 34194.266

Predictor V14 sv:inst type Rank 2 Type F
Codes 2 4

No eligible split

Predictor V15 :field Rank 2 Type F
Codes 2 1

No eligible split

Predictor V9 v204:rank Rank 1 Type M
Codes 2 3

No eligible split

No eligible split for group 6

1

Split 4 candidate groups

     Group       N      Sum(WT)         Mean Y,Z          Var Y,Z        Slope           Variation


        2            57    .57000E+02     .31877E+02      .31175E+03  .21646E+01  .16436E+05
                                                      .34561E+01      .38954E+01
        4          317    .31700E+03     .18356E+02      .10762E+03  .17423E+01  .30268E+05
                                                      .26751E+01      .38973E+01
        7          268    .26800E+03     .11873E+02      .10697E+03  .31721E+01  .17360E+05
                                                      .16903E+01      .41696E+01

Attempt to split group 4 Var= 30268.211

Predictor V14 sv:inst type Rank 2 Type F
Codes 1 2 4

No eligible split

Predictor V15 :field Rank 2 Type F
Codes 2 1

Best split after code 2 Var expl= .15842390E+04

Predictor V9 v204:rank Rank 1 Type M
Codes 1

No eligible split

Best split for group 4 on predictor V15 :field Rank 1
Var expl= .15842390E+04

Split group 4 on V15 :field Var expl= .15842390E+04
Into group 8, codes 2
and group 9, codes 1

 

Split 5 candidate groups

     Group        N     Sum(WT)       Mean Y,Z        Var Y,Z        Slope        Variation


        2             57    .57000E+02   .31877E+02   .31175E+03 .21646E+01 .16436E+05
                                                     .34561E+01  .38954E+01
        7           268   .26800E+03    .11873E+02  .10697E+03  .31721E+01  .17360E+05
                                                     .16903E+01  .41696E+01
        8           139  .13900E+03     .15302E+02  .86966E+02  .19206E+01  .10174E+05
                                                     .23022E+01  .35892E+01
        9           178  .17800E+03     .20742E+02  .11128E+03  .13002E+01  .18510E+05
                                                     .29663E+01  .39650E+01

Attempt to split group 9 Var= 18509.781

Predictor V14 sv:inst type Rank 2 Type F
Codes 2 1 4

No eligible split

Predictor V15 :field Rank 2 Type F
Codes 1

No eligible split

Predictor V9 v204:rank Rank 1 Type M
Codes 1

No eligible split

No eligible split for group 9

1

Split 5 candidate groups

     Group       N    Sum(WT)     Mean Y,Z    Var Y,Z      Slope     Variation


        2       57  .57000E+02  .31877E+02  .31175E+03  .21646E+01  .16436E+05
                                             .34561E+01  .38954E+01
        7      268  .26800E+03  .11873E+02  .10697E+03  .31721E+01  .17360E+05
                                             .16903E+01  .41696E+01
        8      139  .13900E+03  .15302E+02  .86966E+02  .19206E+01  .10174E+05
                                              .23022E+01  .35892E+01 

Attempt to split group 7 Var= 17359.549

Predictor V14 sv:inst type Rank 2 Type F
Codes 1

No eligible split

Predictor V15 :field Rank 2 Type F
Codes 1 2

No eligible split

Predictor V9 v204:rank Rank 1 Type M
Codes 2 3

No eligible split

No eligible split for group 7

1

Split 5 candidate groups

     Group       N    Sum(WT)     Mean Y,Z    Var Y,Z      Slope     Variation


        2       57  .57000E+02  .31877E+02  .31175E+03  .21646E+01  .16436E+05
                                            .34561E+01  .38954E+01
        8      139  .13900E+03  .15302E+02  .86966E+02  .19206E+01  .10174E+05
                                             .23022E+01  .35892E+01 

Attempt to split group 2 Var= 16436.018

Predictor V14 sv:inst type Rank 2 Type F
Codes 3

No eligible split

Predictor V15 :field Rank 2 Type F
Codes 2 1

Best split after code 2 Var expl= .10602170E+04

Predictor V9 v204:rank Rank 1 Type M
Codes 1 2 3

No eligible split

Best split for group 2 on predictor V15 :field Rank 1
Var expl= .10602170E+04

Split group 2 on V15 :field Var expl= .10602170E+04
Into group 10, codes 2
and group 11, codes 1

 

Split 6 candidate groups

     Group       N      Sum(WT)         Mean Y,Z        Var Y,Z      Slope           Variation


        8          139    .13900E+03    .15302E+02     .86966E+02 .19206E+01  .10174E+05
                                                     .23022E+01     .35892E+01
       10          25     .25000E+02    .27880E+02     .16594E+03  .40679E+00  .39657E+04
                                                     .32400E+01     .42733E+01
       11         32      .32000E+02    .35000E+02     .41174E+03  .34537E+01  .11410E+05
                                                     .36250E+01  .36613E+01

Attempt to split group 11 Var= 11410.132

Predictor V14 sv:inst type Rank 2 Type F
Codes 3

No eligible split

Predictor V15 :field Rank 2 Type F
Codes 1

No eligible split

Predictor V9 v204:rank Rank 1 Type M
Codes 1 2

No eligible split

No eligible split for group 11

1

Split 6 candidate groups

     Group       N    Sum(WT)     Mean Y,Z    Var Y,Z      Slope     Variation


        8      139  .13900E+03  .15302E+02  .86966E+02  .19206E+01  .10174E+05
                                              .23022E+01  .35892E+01
       10       25  .25000E+02  .27880E+02  .16594E+03  .40679E+00  .39657E+04
                                             .32400E+01  .42733E+01

Attempt to split group 8 Var= 10174.189

Predictor V14 sv:inst type Rank 2 Type F

Codes 4 1 2

No eligible split

Predictor V15 :field Rank 2 Type F
Codes 2

No eligible split

Predictor V9 v204:rank Rank 1 Type M
Codes 1

No eligible split

No eligible split for group 8

1

Split 6 candidate groups

      Group       N    Sum(WT)     Mean Y,Z    Var Y,Z      Slope     Variation


       10       25  .25000E+02  .27880E+02  .16594E+03  .40679E+00  .39657E+04
                                              .32400E+01  .42733E+01

Attempt to split group 10 Var= 3965.6689

Predictor V14 sv:inst type Rank 2 Type F
Codes 3

No eligible split

Predictor V15 :field Rank 2 Type F
Codes 2

No eligible split

Predictor V9 v204:rank Rank 1 Type M
Codes 1 2 3

No eligible split

No eligible split for group 10

No splits possible

 

The partitioning ends with 6 final groups

The variation explained is 13.9%

One-way analysis of final groups

         Source         Variation            DF

         Explained      .15441640E+05        5
         Error          .95613590E+05        1010
         Total          .11105520E+06        1015
 

Split summary table

Group 1 1016 cases
Mean(Y)= .15990160E+02 Var(Y)= .14602160E+03
Mean(Z)= .20009840E+01 Var(Z)= .38827580E+01
Slope= .30705440E+01 Intercept= .98460470E+01
Corr= .50069920E+00 Variation= .11105520E+06
Split on V14 sv:inst type Var expl= .82337250E+04
Into group 2, codes 3
and group 3, codes 2 4 1

Group 3 959 cases
Mean(Y)= .15045880E+02 Var(Y)= .12057620E+03
Mean(Z)= .19144940E+01 Var(Z)= .37525980E+01
Slope= .28463960E+01 Intercept= .95964720E+01
Corr= .50214670E+00 Variation= .86385480E+05
Split on V9 v204:rank Var expl= .25520410E+04
Into group 4, codes 1
and group 5, codes 2 3

Group 5 642 cases
Mean(Y)= .13411210E+02 Var(Y)= .11905680E+03
Mean(Z)= .15389410E+01 Var(Z)= .32597920E+01
Slope= .32996560E+01 Intercept= .83332390E+01
Corr= .54599230E+00 Variation= .53565230E+05
Split on V14 sv:inst type Var expl= .20114190E+04
Into group 6, codes 2 4
and group 7, codes 1

Group 4 317 cases
Mean(Y)= .18356470E+02 Var(Y)= .10761620E+03
Mean(Z)= .26750790E+01 Var(Z)= .38972570E+01
Slope= .17423130E+01 Intercept= .13695640E+02
Corr= .33156360E+00 Variation= .30268210E+05
Split on V15 :field Var expl= .15842390E+04
Into group 8, codes 2
and group 9, codes 1

Group 2 57 cases
Mean(Y)= .31877190E+02 Var(Y)= .31175250E+03
Mean(Z)= .34561400E+01 Var(Z)= .38953630E+01
Slope= .21646290E+01 Intercept= .24395930E+02
Corr= .24196500E+00 Variation= .16436020E+05
Split on V15 :field Var expl= .10602170E+04
Into group 10, codes 2
and group 11, codes 1
1Final group summary table

Group 6 374 cases
Mean(Y)= .14513370E+02 Var(Y)= .12510840E+03
Mean(Z)= .14304810E+01 Var(Z)= .25889880E+01
Slope= .35936360E+01 Intercept= .93727400E+01
Corr= .51696E+00 Variation= .34194270E+05

Group 7 268 cases
Mean(Y)= .11873130E+02 Var(Y)= .10697260E+03
Mean(Z)= .16902990E+01 Var(Z)= .41696430E+01
Slope= .31720890E+01 Intercept= .65113570E+01
Corr= .62627E+00 Variation= .17359550E+05

Group 8 139 cases
Mean(Y)= .15302160E+02 Var(Y)= .86966010E+02
Mean(Z)= .23021580E+01 Var(Z)= .35891980E+01
Slope= .19206370E+01 Intercept= .10880550E+02
Corr= .39018E+00 Variation= .10174190E+05

Group 9 178 cases
Mean(Y)= .20741570E+02 Var(Y)= .11127750E+03
Mean(Z)= .29662920E+01 Var(Z)= .39649590E+01
Slope= .13001600E+01 Intercept= .16884920E+02
Corr= .24542E+00 Variation= .18509780E+05

Group 10 25 cases
Mean(Y)= .27880000E+02 Var(Y)= .16594330E+03
Mean(Z)= .32400000E+01 Var(Z)= .42733340E+01
Slope= .40678630E+00 Intercept= .26562010E+02
Corr= .65278E-01 Variation= .39656690E+04

Group 11 32 cases
Mean(Y)= .35000000E+02 Var(Y)= .41174190E+03
Mean(Z)= .36250000E+01 Var(Z)= .36612900E+01
Slope= .34537440E+01 Intercept= .22480180E+02
Corr= .32568E+00 Variation= .11410130E+05

INTERPRETATION

IDAM reports analysis specifications
         No. of cases read from the input data file = 1039
         No. of cases rejected for missing data = 23
         No. of cases processed = 1016
         Dependent variable = V4
         Covariate variable= V13

 

Group 1 (the entire sample)
N=Number of cases
SUM (WT) = Number of cases
Mean

Y = 15.990 (Mean of dependent variable)
Z = 2.001 (Mean of covariate variable)

Variance of

Y = Dependent variable
Z = Covariate variable
Slope = Slope of the dependent variable Y on the covariate Z in the entire sample
Variation = Sum of squares of the dependent variable

 

Attempt to split Group 1

The algorithm attempts to make binary splits at different cut-off points of all variables and selects the best cut-off, which results in maximum difference in the slope between the parent group and the descendent group.

Predictor      Cut-off     Variance explained
           (After code)
V14 (3,2,4,1)    3           8233.725
V15 (2,1)        2           901.284
V9 (1,2,3)       2           3049.774

Best split for Group 1 on predictor V14

2       57  .57000E+02   .31877E+02  .31175E+03  .21646E+01  .16436E+05
                                     .34561E+01  .38954E+01
3      959  .95900E+03  .15046E+02  .12058E+03  .28464E+01  .86385E+05
                                     .19145E+01  .37526E+01

Group-3 is selected for further split since its slope is greater than that of Group-2

All the predictors are examined for their best splits.

Predictor        Cut-off     Variance explained
               (After code)
V14 (2,4,1)         4        1452.317
V15 (2,1)           No eligible split	  
V9 (1,2,3)          1        2552.041

Group-3 is now spilt on V9 into two groups:

Group-4       V9(Code 1)
Group-5       V9(Code 2, 3)

 

Split 3
Now the candidate group are:

Group       N    Sum(WT)        Mean Y,Z        Var Y,Z      Slope            Variation
2             57   .57000E+02  .31877E+02  .31175E+03  .21646E+01  .16436E+05
                                   	34561E+01  .38954E+01
4           317   .31700E+03  .18356E+02  .10762E+03  .17423E+01  .30268E+05
.26751E+01  .38973E+01
5           642   .64200E+03  .13411E+02  .11906E+03  .32997E+01  .53565E+05
                             .15389E+01  .32598E+01

Group 5 is selected for forth split, since it accounts for the largest value of the slope and variance.

All predictors are evaluated for splitting Group-5.

Predictor      Cut-off           Variance explained
             (After code)
V14 (2,4,1)       4              2011.419
V15 (2,1)         No eligible split	  
V9 (2,3) 1        No eligible split	  

Group-5 is now split on V14 into two groups:
       Group-6V14(2, 4)
       Group-7V14(1)

 

Split 4

At this stage, there are 4 candidates groups for further splits.

Group     N     Sum(WT)     Mean Y,Z        Var Y,Z       Slope            Variation
 2           57   .57000E+02   .31877E+02  .31175E+03  .21646E+01  .16436E+05
                             .34561E+01  .38954E+01
 4         317    .31700E+03  .18356E+02  .10762E+03  .17423E+01  .30268E+05
                             .26751E+01  .38973E+01
 6         374   .37400E+03   .14513E+02  .12511E+03  .35936E+01  .34194E+05
                             .14305E+01  .25890E+01
 7         268  .26800E+03    .11873E+02  .10697E+03  .31721E+01  .17360E+05
                             .16903E+01  .41696E+01

Group-6 is selected for further split since it has the largest value of the slope (variance explained).

No eligible split for Group-6 is found.

Group-4 is then considered for possible split. All the predictor variables were evaluated for splitting this group. Predictors V9 and V14 did not meet the eligibility criterion for split.

Hence, this group was split on variable V15 (1, 2) into two groups:

Group-8V15(Code: 2)
Group-9V15(Code: 1)

 

At this stage there are 4 candidate groups for further split

Group        N     Sum(WT)       Mean Y,Z        Var Y,Z        Slope        Variation
2             57    .57000E+02   .31877E+02   .31175E+03 .21646E+01 .16436E+05
                                 .34561E+01   .38954E+01
7           268   .26800E+03    .11873E+02   .10697E+03  .31721E+01  .17360E+05
                      		    .16903E+01    .41696E+01
8           139  .13900E+03     .15302E+02   .86966E+02  .19206E+01  .10174E+05
         	               	    .23022E+01    .35892E+01
9           178  .17800E+03     .20742E+02   .11128E+03  .13002E+01  .18510E+05
                 		       .29663E+01    .39650E+01

Group-9, Group-7 were sequentially evaluated for further split, but none of the predictor met the eligibility criterion to split any of these groups.

Group 2 was considered for further split. Only one variable i.e.V15 met the eligibility criterion, and hence this group was split into two groups:

Group-10V15(Code: 2)
Group-11V15(Code: 1)

 

At this stage there are three candidate groups for further split:

Group-8, Group-10 and Group-11

Group       N      Sum(WT)         Mean Y,Z        Var Y,Z      Slope           Variation
 8            139    .13900E+03    .15302E+02    .86966E+02 .19206E+01  .10174E+05
		                              .23022E+01     .35892E+01
10            25     .25000E+02    .27880E+02     .16594E+03  .40679E+00  .39657E+04
         		                  .32400E+01     .42733E+01
11           32      .32000E+02    .35000E+02     .41174E+03  .34537E+01  .11410E+05
         		                  .36250E+01     .36613E+01

These groups were considered sequentially for further split, but none of the predictor variables met the eligibility criterion for splitting any of these groups.

 

The partition ends with 6 terminal groups

Group-10, Group-11, Group-8, Group-9, Group-6 and Group-7.

This partition explains 13.9% of the total variance

Analysis of variance Table

Source         Variation           DF
 
Explained     .15441640E+05         5
Error         .95613590E+05        1010
Total         .11105520E+06        1015
 

History of the splitting process.

 

Graphical representation of the splitting process.

                                                         _______________
 N=Number of cases                                       |   Group 1   |
 Y=Dep. var. mean, v264:supervsn                         |   N=1016    |
 Z=Covariate mean                                        |Y=   15.99016|
 S=Slope                                                 |Z=    2.00098|
 C=Predictor codes                                       |S=    3.07054|
                                                         |   Split 1   |
                                                         |   on V14    |
                                                         |_____________|
                                                                |
                         _________________________________sv:inst type__________________________________
                               C=3                                                           C=2,4,1
                                |                                                               |
                         _______|_______                                                 _______|_______
                         |   Group 2   |                                                 |   Group 3   |
                         |    N=57     |                                                 |    N=959    |
                         |Y=   31.87719|                                                 |Y=   15.04588|
                         |Z=    3.45614|                                                 |Z=    1.91449|
                         |S=    2.16463|                                                 |S=    2.84640|
                         |   Split 5   |                                                 |   Split 2   |
                         |   on V15    |                                                 |    on V9    |
                         |_____________|                                                 |_____________|
                                |                                                               |
         ____________________:field_____________________                 ___________________v204:rank___________________
               C=2                             C=1                             C=1                            C=2,3
                |                               |                               |                               |
         _______|_______                 _______|_______                 _______|_______                 _______|_______
         |  Group 10   |                 |  Group 11   |                 |   Group 4   |                 |   Group 5   |
         |    N=25     |                 |    N=32     |                 |    N=317    |                 |    N=642    |
         |Y=   27.88000|                 |Y=   35.00000|                 |Y=   18.35647|                 |Y=   13.41121|
         |Z=    3.24000|                 |Z=    3.62500|                 |Z=    2.67508|                 |Z=    1.53894|
         |S=     .40679|                 |S=    3.45374|                 |S=    1.74231|                 |S=    3.29966|
         |    Final    |                 |    Final    |                 |   Split 4   |                 |   Split 3   |
         |             |                 |             |                 |   on V15    |                 |   on V14    |
         |_____________|                 |_____________|                 |_____________|                 |_____________|
                                                                                |                               |
                                                                 ____________:field_____________ _________sv:inst type__________
                                                                       C=2             C=1            C=2,4            C=1
                                                                        |               |               |               |
                                                                 _______|_______ _______|_______ _______|_______ _______|_______
                                                                 |   Group 8   | |   Group 9   | |   Group 6   | |   Group 7   |
                                                                 |    N=139    | |    N=178    | |    N=374    | |    N=268    |
                                                                 |Y=   15.30216| |Y=   20.74157| |Y=   14.51337| |Y=   11.87313|
                                                                 |Z=    2.30216| |Z=    2.96629| |Z=    1.43048| |Z=    1.69030|
                                                                 |S=    1.92064| |S=    1.30016| |S=    3.59364| |S=    3.17209|
                                                                 |    Final    | |    Final    | |    Final    | |    Final    |
                                                                 |             | |             | |             | |             |
                                                                 |_____________| |_____________| |_____________| |_____________|



10.3 Searching for Structure: CHI Analysis

Research Question

:

Explore the pattern of relationships between the time – involvement of academic scientists in teaching ( coded as; 1=Low, 2= Averagr, 3= Above averagre, 4= High) and a set of contextual variables: Rank, institutional settings, scientific field, external funding of research.

Methodology

:

Classification and regression trees: SEARCH module – CHI analysis

Dataset

:

PSN.DAT

SYNTAX*
$RUN SEARCH
$FILES
PRINT = PSN.LST
DICTIN = PSN.DIC
DATAIN = PSN.DAT
$SETUP
INCLUDE V2=1-3 AND V3=1-2 AND V5 =1,2 AND V6=1-4
SEARCHING OF STRUCTURE
BADDATA=MD1 -
 ANALYSIS=CHI -
 DEPVAR=V6 CODE= (1-4) -
 MINCASE=10 -
 PRINT=(TREE,DICT,TRACE,FINAL)
VARS=V2 TYPE=M
VARS=(V3,V4, V5) TYPE=F
------------------------------------ 

Note : Search module does not recognize missing data or invalid codes for the predictor variables. Hence, the filter Include has been used. All options set at default values

EXTRACTION FROM COMPUTER OUTPUT

After filtering 1011 cases read from the input data file

1 cases contained illegal characters on filter variables and were skipped

 

Split 1 candidate groups

    Group     N    Sum(WT)    Variation
      1     1011  .10110E+04  .27381E+04

Attempt to split group 1 Var= 2738.0928

Predictor V2 RANK Rank 1 Type M
Codes 1 2 3

Best split after code 1 Var expl= .87576360E+02

Predictor V3 FUNDING Rank 2 Type F
Codes 1 2

Best split after code 1 Var expl= .84713570E+02

Predictor V4 INSTYP Rank 2 Type F
Codes 3 4 2 1

Best split after code 3 Var expl= .86131580E+02

Predictor V5 FIELD Rank 2 Type F
Codes 2 1

No eligible split

Best split for group 1 on predictor V2 RANK Rank 1
Var expl= .87576360E+02

Split group 1 on V2 RANK Var expl= .87576360E+02
Into group 2, codes 1
and group 3, codes 2 3

 

Split 2 candidate groups

Group           N    Sum(WT)       Variation
   2           356  .35600E+03  .85404E+03
   3           655  .65500E+03  .17965E+04

Attempt to split group 3 Var= 1796.4799

Predictor V2 RANK Rank 1 Type M
Codes 2 3

Best split after code 2 Var expl= .23601020E+02

Predictor V3 FUNDING Rank 2 Type F
Codes 2 1

Best split after code 2 Var expl= .31537450E+02

Predictor V4 INSTYP Rank 2 Type F
Codes 2 3 4 1

Best split after code 3 Var expl= .54556970E+02

Predictor V5 FIELD Rank 2 Type F
Codes 2 1

No eligible split

Best split for group 3 on predictor V4 INSTYP Rank 1
Var expl= .54556970E+02

Split group 3 on V4 INSTYP Var expl= .54556970E+02
Into group 4, codes 2 3
and group 5, codes 4 1

 

Split 3 candidate groups

Group          N     Sum(WT)     Variation
  2           356  .35600E+03  .85404E+03
  4           152  .15200E+03  .38393E+03
  5           503  .50300E+03  .13580E+04

Attempt to split group 5 Var= 1357.9944

Predictor V2 RANK Rank 1 Type M
Codes 2 3

No eligible split

Predictor V3 FUNDING Rank 2 Type F
Codes 2 1

No eligible split

Predictor V4 INSTYP Rank 2 Type F
Codes 4 1

No eligible split

Predictor V5 FIELD Rank 2 Type F
Codes 2 1

No eligible split

No eligible split for group 5

Split 3 candidate groups

     Group       N    Sum(WT)    Variation


        2      356  .35600E+03  .85404E+03
        4      152  .15200E+03  .38393E+03
 

Attempt to split group 2 Var= 854.03656

Predictor V2 RANK Rank 1 Type M
Codes 1

No eligible split

Predictor V3 FUNDING Rank 2 Type F
Codes 1 2

Best split after code 1 Var expl= .31131080E+02

Predictor V4 INSTYP Rank 2 Type F
Codes 3 4 1 2

Best split after code 3 Var expl= .42759830E+02

Predictor V5 FIELD Rank 2 Type F
Codes 2 1

No eligible split

Best split for group 2 on predictor V4 INSTYP Rank 1
Var expl= .42759830E+02

Split group 2 on V4 INSTYP Var expl= .42759830E+02
Into group 6, codes 3
and group 7, codes 4 1 2

 

Split 4 candidate groups

Group          N    Sum(WT)      Variation
  4           152  .15200E+03  .38393E+03
  6            36  .36000E+02  .91390E+01
  7           320  .32000E+03  .80214E+03

Attempt to split group 7 Var= 802.13776

Predictor V2 RANK Rank 1 Type M
Codes 1

No eligible split

Predictor V3 FUNDING Rank 2 Type F
Codes 1 2

Best split after code 1 Var expl= .24340890E+02

Predictor V4 INSTYP Rank 2 Type F
Codes 4 1 2

No eligible split

Predictor V5 FIELD Rank 2 Type F
Codes 2 1

No eligible split

Best split for group 7 on predictor V3 FUNDING Rank 1
Var expl= .24340890E+02

Split group 7 on V3 FUNDING Var expl= .24340890E+02
Into group 8, codes 1
and group 9, codes 2

 

Split 5 candidate groups

Group          N    Sum(WT)      Variation
  4           152  .15200E+03  .38393E+03
  6            36  .36000E+02  .91390E+01
  8           120  .12000E+03  .32932E+03
  9           200  .20000E+03  .44848E+03

Attempt to split group 9 Var= 448.47726

Predictor V2 RANK Rank 1 Type M
Codes 1

No eligible split

Predictor V3 FUNDING Rank 2 Type F
Codes 2

No eligible split

Predictor V4 INSTYP Rank 2 Type F
Codes 1 4 2

No eligible split

Predictor V5 FIELD Rank 2 Type F
Codes 1 2

No eligible split

No eligible split for group 9

Split 5 candidate groups

     Group      N    Sum(WT)    Variation
        4      152  .15200E+03  .38393E+03
        6       36  .36000E+02  .91390E+01
        8      120  .12000E+03  .32932E+03

Attempt to split group 4 Var= 383.92853

Predictor V2 RANK Rank 1 Type M
Codes 2 3

No eligible split

Predictor V3 FUNDING Rank 2 Type F
Codes 1 2

No eligible split

Predictor V4 INSTYP Rank 2 Type F
Codes 3 2

No eligible split

Predictor V5 FIELD Rank 2 Type F
Codes 1 2

No eligible split

No eligible split for group 4

1

Split 5 candidate groups

     Group      N    Sum(WT)    Variation
        6       36  .36000E+02  .91390E+01
        8      120  .12000E+03  .32932E+03

Attempt to split group 8 Var= 329.31958

Predictor V2 RANK Rank 1 Type M
Codes 1

No eligible split

Predictor V3 FUNDING Rank 2 Type F
Codes 1

No eligible split

Predictor V4 INSTYP Rank 2 Type F
Codes 4 1 2

No eligible split

Predictor V5 FIELD Rank 2 Type F
Codes 2 1

No eligible split

No eligible split for group 8

1

Split 5 candidate groups

      Group       N   Sum(WT)      Variation
        6        36  .36000E+02   .91390E+01

Attempt to split group 6 Var= 9.1389990

Predictor V2 RANK Rank 1 Type M
Codes 1

No eligible split

Predictor V3 FUNDING Rank 2 Type F
Codes 1 2

No eligible split

Predictor V4 INSTYP Rank 2 Type F
Codes 3

No eligible split

Predictor V5 FIELD Rank 2 Type F
Codes 2 1

No eligible split

No eligible split for group 6

No splits possible

 

The partitioning ends with 5 final groups

The variation explained is 7.6%

One-way analysis of final groups

          Source         Variation           DF
 
        Explained     .20923410E+03           4
         Error        .25288590E+04        1006
         Total        .27380930E+04        1010
 

Split summary table

Group 1 1011 cases Variation= .27380930E+04
Split on V2 RANK Var expl= .87576360E+02
Into group 2, codes 1
and group 3, codes 2 3

Group 3 655 cases Variation= .17964800E+04
Split on V4 INSTYP Var expl= .54556970E+02
Into group 4, codes 2 3
and group 5, codes 4 1

Group 2 356 cases Variation= .85403660E+03
Split on V4 INSTYP Var expl= .42759830E+02
Into group 6, codes 3
and group 7, codes 4 1 2

Group 7 320 cases Variation= .80213780E+03
Split on V3 FUNDING Var expl= .24340890E+02
Into group 8, codes 1
and group 9, codes 2

 

Final group summary table

Group 4       152 cases       Variation= .38392850E+03
Group 5       503 cases       Variation= .13579940E+04
Group 6       36 cases         Variation= .91389990E+01
Group 8       120 cases       Variation= .32931960E+03
Group 9       200 cases       Variation= .44847730E+03

Dependent variable percent distribution for each group (*=Final groups)

              1        2        3        4*       5*       6*       7        8*       9*
 
 Code= 1     36.10    52.25    27.33    42.11    22.86    97.22    47.19    31.67    56.50
 Code= 2     19.09    20.51    18.32    22.37    17.10     2.78    22.50    25.00    21.00
 Code= 3     21.96    16.85    24.73    26.97    24.06      .00    18.75    23.33    16.00  
 Code= 4     22.85    10.39    29.62     8.55    35.98      .00    11.56    20.00     6.50
INTERPRETATION

IDAMS reports that 1011 cases were read from the input data file.

 

Split: 1
At this stage, the candidate group is Group-1 comprising all the cases.

N = Number of cases
Sum (weight) = Number of cases
Variation = 2738.0928
This is the entropy of the whole group.

Attempt to split Group-1

All the predictors are evaluated, one by one, for their best splits.

Predictor		Cut-off			Entropy
		    	After code

V2 (1, 2, 3)		1			87.57636
V3 (1, 2)		1			84.71357
V4 (3,4,2,1)		3			86.13158
V5 (2, 1)		No eligible split		

Best split for Group-1 on predictor V2 since it accounts for the maximum entropy.

Group-1 is split on V2 into the following groups:

Group – 2V2 (Code: 1)
Group – 3V2 (Code: 2, 3)

 

Split 2

At this stage there are two candidate groups.

Group          N    Sum(WT)       Variation
  2           356  .35600E+03   .85404E+03
  3           655  .65500E+703  .1965E+04

Group – 3 is considered first for split, since it accounts for the maximum entropy.

Best predictors and their cut-off values are as follows:

Predictor		Cut-off		Entropy
		      	After code

V2 ( 2, 3)		2			23.60102
V3 (2, 1)		No eligible split
V4 (2, 3,4,1)	3			54.556970
V5 (2, 1)		No eligible split		

Best split for Group-3 is on predictor V4 since it has the maximum value of entropy. This group is split into two groups on variable V4:

Group – 4V4 (Code: 2, 3)
Group – 5V4 (Code: 4, 1)

 

Split 3
At this stage there are three candidate groups for further splitting:

Group         N     Sum(WT)     Variation
  2           356  .35600E+03  .85404E+03
  4           152  .15200E+03  .38393E+03
  5           503  .50300E+03  .13580E+04

It can be easily seen that Group-5 has the maximum value of entropy and would be considered first for possible split.

All the predictors are evaluated in the same manner as for the earlier splits. None of the predictor variable met the eligibility criterion for splitting this group.

Hereafter Group-2 was considered for further split, since it accounted for greater entropy than Group-4.

All the predictors were evaluated for splitting Group-2. Predictor V4 (3,4,1,2) was found to be the best predictor at cut-off value (code 3). Hence, Group-2 was partitioned into the following groups:

Group – 6V4 (Code: 3)
Group – 7V4 (Code: 4, 1, 2)

 

Split – 4
At this stage there are three candidate groups for possible splits

Group          N    Sum(WT)      Variation
   4           152  .15200E+03  .38393E+03
   6            36  .36000E+02  .91390E+01
   7           320  .32000E+03  .80214E+03

Group-7 has the maximum value of entropy and was considered first for a possible split.

All the predictors were evaluated for splitting Group-7, Variable V3 was found to be the best splitter at cut-off value = 1. Hence, Group-7 is split into the following groups on this predictor:

Group – 8V3(Code:1)
Group – 9V3(Code:2)

 

Split 5

At this stage, there are 4 candidate groups for possible splits.

Group         N     Sum(WT)    Variation
 4           152  .15200E+03  .38393E+03
 6            36  .36000E+02  .91390E+01
 8           120  .12000E+03  .32932E+03
 9           200  .20000E+03  .44848E+03

Group-9 has the highest value of entropy, and is, therefore, considered first for possible split. However, none of the predictor variable could meet the eligibility criterion for splitting this group.

Thereafter, Group-4, Group-8 and Group-6 were considered one after the other for possible split. However, none of the predictors was able to split any of these groups.

 

The partitioning ends with 5 final groups, Group-6, Group-8, Group-9, Group-4, Group-5.

7.6 of the total variance is explained by the partition.

One-way analysis of final groups

         Source         Variation            DF
 
        Explained     .20923410E+03           4
        Error         .25288590E+04        1006
        Total         .27380930E+04        1010
 
 

Summary table
This is a history of partitioning of the data on various predictors.

 

Final group summary table
The size and entropy of the final groups.

Frequency distribution (%) of different categories of the dependent variable in each of the nine groups created by the algorithm.

Find groups are identified by*

Graphical representation of the splitting process summarized at .

   
                                                         _______________
 N=Number of cases                                       |   Group 1   |
 C=Predictor codes                                       |   N=1011    |
                                                         |   Split 1   |
                                                         |    on V2    |
                                                         |_____________|
                                                                |
                         _____________________________________RANK______________________________________
                               C=1                                                            C=2,3
                                |                                                               |
                         _______|_______                                                 _______|_______
                         |   Group 2   |                                                 |   Group 3   |
                         |    N=356    |                                                 |    N=655    |
                         |   Split 3   |                                                 |   Split 2   |
                         |    on V4    |                                                 |    on V4    |
                         |_____________|                                                 |_____________|
                                |                                                               |
         ____________________INSTYP_____________________                 ____________________INSTYP_____________________
               C=3                           C=4,1,2                          C=2,3                           C=4,1
         _______|_______                 _______|_______                 _______|_______                 _______|_______
         |   Group 6   |                 |   Group 7   |                 |   Group 4   |                 |   Group 5   |
         |    N=36     |                 |    N=320    |                 |    N=152    |                 |    N=503    |
         |    Final    |                 |   Split 4   |                 |    Final    |                 |    Final    |
         |             |                 |    on V3    |                 |             |                 |             |
         |_____________|                 |_____________|                 |_____________|                 |_____________|
                                                |
                                 ____________FUNDING____________
                                       C=1             C=2
                                        |               |
                                 _______|_______ _______|_______
                                 |   Group 8   | |   Group 9   |
                                 |    N=120    | |    N=200    |
                                 |    Final    | |    Final    |
                                 |             | |             |
                                 |_____________| |_____________|