Classification and Regression Trees

10

Treebased modeling is an exploratory data analytic technique for uncovering structure in large data sets. This technique is quite useful for:
Treebased models are useful for both classification and regression problems. In these problems, there is a set of classification or predictor variables (X_{i} ) and a dependent variable (Y). The X_{i} variables may be a mixture of nominal and/ or ordinal scales (or code intervals of equalinterval scale) and Y a quantitative or a qualitative (i.e., nominal or categorical) variable.
In classification trees the dependent variable is categorical, whereas in regression trees the dependent variable is quantitative. Regression trees parallel regression/ANOVA (Analysis of variance) modeling. Classification trees parallel discriminant analysis
The Search module in IDAMS computes classification and regression trees. The basis of the Search algorithm is the question embedded in an iterative procedure: What dichotomous split on which predictor variable will maximally improve the predictability of the dependent variable?
The SEARCH module carries out sequential binary splits according to a local optimization criterion, which varies with the measurement scale of the dependent variable.
Predictor
Variable 
Dependent
Variable 
Splitting
Criterion 
Program
Option 
Several (Ordinal/Nominal) 
Quantitative 
Explained variation based on group means 
Means analysis 
Several (Ordinal/Nominal) plus one covariate 
Quantitative 
Explained variation based on the regression of the dependent variable on the covariate 
Regression analysis 
Several (Ordinal/Nominal) 
Nominal/ordinal or a set of dichotomous variables 
Explained variation is the entropy of the dependent variable 
Chisquare analysis 