| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
a) Trace statistics. These are the statistics calculated on the whole sample (for g = 1), and on tentative splits for parent groups as well as for each group resulting from the best split.
i) Sum (wt). Number of cases (Ng) if the weight variable is not specified, or weighted number of cases (Wg) in group g.
ii) Mean y. Mean value of the dependent variable y in group g.
|
iii) Var y. Variance of the dependent variable y in group g.
|
iv) Variation. Sum of squares of the dependent variable (as in one-way analysis of variance) in group g.
|
v) Var expl. Explained variation is measured by the difference between the variation in the parent group and the sum of variation in the two children groups. It provides, for each predictor, the amount of variation explained by the best split for this predictor, i.e. the highest value obtained over all possible splits for this predictor.
Let g1 and g2 denote two subgroups (children groups) obtained in a split of the parent group g, and Vg1 and Vg2 their respective variation. The variation explained by such a split of group g is calculated as follows:
|
Then, this value is maximized over all possible splits for the predictor.
vi) Explained variation. This is the percent of the total variation explained by the final groups.
|
where EV and TV are, respectively, the variation explained by the final groups and the total variation (see 1.b below).
b) One-way analysis of final groups. These are one-way analysis of variance statistics calculated for the final groups.
i) Explained variation and DF. This is the amount of variation explained by the final groups and the corresponding degrees of freedom.
|
|
ii) Total variation and DF. Variation calculated for the whole sample, i.e. for group 1, and the corresponding degrees of freedom.
|
|
iii) Error and DF. This is the amount of unexplained variation and the corresponding degrees of freedom.
|
|
c) Split summary table. The table provides group mean value, variance and variation of the dependent variable at each split as well as the variation explained by that split (see 1.a above).
d) Final group summary table. The table provides mean value, variance and variation of the dependent variable for the final groups (see 1.a above).
e) Percent of explained variation. The percent of total variation explained by the best split for each group is calculated as follows:
|
Note that this value is equal to zero for the final groups (indicated by an asterisk).
f) Residuals. The residuals are the differences between the observed value and the predicted value of the dependent variable.
|
As predicted value, a case is assigned the mean value of the dependent variable for the group to which it belongs, i.e.
|
a) Trace statistics. These are the statistics calculated on the whole sample (for g = 1), and on tentative splits for parent groups as well as for each group resulting from the best split.
i) Sum (wt). Number of cases (Ng) if the weight variable is not specified, or weighted number of cases (Wg) in group g.
ii) Mean y,z. Mean value of the dependent variable y and the covariate z in group g (see 1.a.ii above).
iii) Var y,z. Variance of the dependent variable y and the covariate z in group g (see 1.a.iii above).
iv) Slope. This is the slope of the dependent variable y on the covariate z in group g.
|
v) Variation. This is the error or residual sum of squares from estimating the variable y by its regression on covariate in group g, i.e. a measure of deviation about the regression line.
|
where bg is the slope of the regression line in group g.
vi) Var expl. Explained variation (EV). See 1.a.v above for general information, and 2.a.v above for details on V (variation) used in regression analysis.
vii) Explained variation. This is the percent of the total variation explained by the final groups. See 1.a.iv above and 2.b below.
b) One-way analysis of final groups. These are the summary statistics for the final groups. See 1.b above for general information, and 2.a.v and 2.a.vi above for details on V and EV measures used in regression analysis.
c) Split summary table. The table provides group mean value, variance and variation of the dependent variable at each split as well as the variation explained by that split. It also provides mean value and variance of the covariate. See 2.a above for formulas. Moreover, the following regression statistics are calculated for each split:
i) Slope. It is the slope of the dependent variable y on the covariate z in group g (see 2.a.iv above).
ii) Intercept. It is the constant term in the regression equation.
|
where bg is the slope in group g.
iii) Corr. Pearson r correlation coefficient between the dependent variable y and the covariate z in group g.
|
d) Final group summary table. The table provides the same information (except the explained variation) as in "Split summary table", but for final groups.
e) Percent of explained variation. The percent of total variation explained by the best split for each group (see 1.e and 2.a.vi above).
f) Residuals. The residuals are the differences between the observed value and the predicted value of dependent variable.
|
Predicted values are calculated as follows:
|
where ai and bi are regression coefficients for the final group i.
a) Trace statistics. These are the statistics calculated on the whole sample (for g = 1), and on tentative splits for parent groups as well as for each group resulting from the best split.
i) Sum (wt). Number of cases (Ng) if the weight variable is not specified, or weighted number of cases (Wg) in group g.
ii) Variation. This is the entropy for group g, i.e. a measure of disorder in the distribution of the dependent variable.
|
where
|
and xjgk is the "frequency" (coded 0 or 1) of code j (or value of variable j) of case k in group g.
iii) Var expl. Explained variation (EV). See 1.a.v above for general information, and 3.a.ii above for details on V (variation) used in chi-square analysis.
iv) Explained variation. This is the percent of the total variation explained by the final groups. See 1.a.vi above and 3.b below.
b) One-way analysis of final groups. These are the summary statistics for the final groups. See 1.b above for general information, and 3.a.ii and 3.a.iii above for details on V and EV measures used in chi-square analysis.
c) Split summary table. The table provides variation of the dependent variable at each split as well as the variation explained by that split. See 3.a.ii and 3.a.iii above for formulas.
d) Final group summary table. The table provides variation of the dependent variable for the final groups.
e) Percent of explained variation. The percent of total variation explained by the best split for each group (see 1.e and 3.a.iii above).
f) Percent distributions. A bivariate table showing percentage distributions of the dependent variable for all groups (Pjg).
g) Residuals. The residuals are the differences between the observed value and the predicted value of dependent variable.
For analysis with one categorical dependent variable, residuals are calculated for each category of the variable. Thus, the number of residuals is equal to the number of categories.
|
As predicted value for category j, a case is assigned the proportion of cases being in this category for the group to which the case belongs, i.e.
|
For analysis with several dichotomous dependent variables, residuals are calculated for each variable. Thus, the number of residuals is equal to the number of dependent variables.
|
Observed values are calculated as follows:
|
As predicted value for variable j, a case is assigned the proportion of cases having value 1 for this variable in the group to which the case belongs, i.e.
|
Sonquist, J.A., Baker, E.L., Morgan, J.N., Searching for Structure, Revised ed., Institute for Social Research, The University of Michigan, Ann Arbor, 1974.