The objective of Means Analysis is to create groups, which allow for the best prediction of the dependent variable values from the group means. The splitting criterion is therefore based on group means.
The splitting process involves the following steps:
The total sum of squares for the parent group is
where NL1+NL2=NL, and NL1, NL2 ³ NMIN; NMIN is a minimum group size requirement
The steps A-C are recursively performed. The process stops when one or more of the several criteria below are met:
These criteria ensure that the process stops before unreliable reduction in error variance occurs.
It may seem odd as to why the splitting criterion is not based on statistical significance. But, in reality, too many splits are evaluated that statistical significance becomes an irrelevant criterion. Suppose there are m different predictors of k categories each. Even if all the predictors are monotonic, each split looks at m ´ (k-1) possibilities and by the time twenty-five such splits have been decided upon, the program has searched 25 ´ m ´ (k-1) possible splits. With twenty predictors of ten classes each, this figure would be 4,500. If the monotocity of predictors is not preserved (or is absent as in case of nominal variables), the number of possible splits would explode. Hence there is no point worrying about statistical significance or degrees of freedom.
In certain situations, a particular predictor dominates the dependent variable to such an extent that hardly any other predictor matters. For example, in economic studies, income or education so dominates the dependent variable that the data are split on little else. In such a situation, it may be desirable to remove that effect of a dominant variable to visualize the effects of other variables. One could assume a linear relationship through the origin and simply divide the dependent variable into groups by that predictor. This often has the added advantage of improving the homogeneity of variance where the variance of the dependent variable is related to its level.
Similar problems arise in empirical research in sociology and psychology, where it becomes necessary to isolate the effect of a particular variable under a wide variety of circumstances.
Further in the analysis of temporal changes in a phenomenon, the initial value of the phenomenon clearly affects its value measured at a subsequent time. That is why the residuals from the regression of its t2 value on its initial t1 value are often used as a measure of change, instead of the raw increments. However, this "initial value" effect may not be the same for all subgroups in the population.
To deal with these covariate problems, regression analysis is performed, where the sum of squares is explained by differences in the two subgroup regression lines, instead of the subgroup means.
This method can be used when analyzing a dependent variable with one covariate and several predictors. It aims at creating groups, which would allow for the best prediction of the dependent variable values from the group regression equation and the value of the covariate. In other words, created groups should provide largest differences in group regression lines. The splitting criterion (explained variation) is based upon group regression of the dependent variable on the covariate.
where bg is the slope of the regression line in group g.