## 10.1 Trees

Trees are directed graphs beginning with one node, called parent node, and branching into many descendent nodes. The tree growing methodology consists of recursive splitting of each group of individuals into two subgroups, starting from the parent node (total sample) and continuing until the subgroups contain mostly ‘similar‘ individuals, without allowing these subgroups to become too small. A split is considered to be permissible if partition of the data based on a predictor code satisfies the following constraints:

• The predictor ordinality is not violated.
• Minimum group frequency. Each descendent subgroup contains at least a minimum number of objects.
• A minimum value of criterion improvement is ensured

The tree is constructed by a recursive partitioning of the dataset. The procedure examines all possible splits of the data along each predictor variable that most reduces some measure of node impurity. The impurity of a set of objects is designed to capture how dissimilar the objects are to each other.. In the case of regression trees, the impurity measure is the Sum of Squares, whereas in the case of classification trees, several measures are proposed in the literature: Gini index, Misclassification, Twoing and Entropy. However, Search uses entropy as a measure of impurity of a node. An essential property of the impurity is that it should decrease across the splitting process.

I (t ) ³ a i (tr) + (1- a ) i ( tl)

0 £ a £ 1

If X is an ordered variable, this approach searches all possible values C for splits of the form

.X £ C

A case is assigned to the left descendent node if the inequality is satisfied and to the right descendent node if the inequality is not satisfied.

If X is a categorical (unordered) predictor, the search is over all splits of the form

X Î A

where A is a non-empty subset of the values taken by X.

If X is an ordered variable with n distinct values at a node, (n-1) splits of the form (1) are made. Therefore, the order of the computations at each node is linear. in the number of distinct values of each predictor. If X is an unordered categorical variable, the order of computations increases exponentially with the number of categories: (2 m-1 –1), where m is the number of categories of X.

The heart of the tree growing process is binary recursive partitioning. It is binary, because the parent nodes are always split into exactly two child nodes. It is recursive, because the process can be repeated by treating each child node as a parent. node of the tree.

The algorithm makes all possible splits of each predictor variable and selects the variable and the cut-point, which provides an optimal split of a group into two subgroups. The procedure is applied recursively to the descendents, which become the parents of successive splits, and so on, until one gets the final nodes. The following steps are taken at each node:

• Define the possible (permissible) splits at each node
• Select the best split according to a statistical criterion.
• Verify a stopping criterion based on a statistical threshold.