Classification and Regression Trees
|
10
|
Tree-based modeling is an exploratory data analytic technique for uncovering structure in large data sets. This technique is quite useful for:
Tree-based models are useful for both classification and regression problems. In these problems, there is a set of classification or predictor variables (Xi ) and a dependent variable (Y). The Xi variables may be a mixture of nominal and/ or ordinal scales (or code intervals of equal-interval scale) and Y a quantitative or a qualitative (i.e., nominal or categorical) variable.
In classification trees the dependent variable is categorical, whereas in regression trees the dependent variable is quantitative. Regression trees parallel regression/ANOVA (Analysis of variance) modeling. Classification trees parallel discriminant analysis
The Search module in IDAMS computes classification and regression trees. The basis of the Search algorithm is the question embedded in an iterative procedure: What dichotomous split on which predictor variable will maximally improve the predictability of the dependent variable?
The SEARCH module carries out sequential binary splits according to a local optimization criterion, which varies with the measurement scale of the dependent variable.
|
Predictor
Variable |
Dependent
Variable |
Splitting
Criterion |
Program
Option |
|
Several (Ordinal/Nominal) |
Quantitative |
Explained variation based on group means |
Means analysis |
|
Several (Ordinal/Nominal) plus one covariate |
Quantitative |
Explained variation based on the regression of the dependent variable on the covariate |
Regression analysis |
|
Several (Ordinal/Nominal) |
Nominal/ordinal or a set of dichotomous variables |
Explained variation is the entropy of the dependent variable |
Chi-square analysis |