Divide and Conquer
- Internal decision nodes
- Univariate: Uses a single attribute, $x_i$
- Multivariate: Uses a single attribute, $x$
- Leaves
- Classification: Class labels, or proportions
- Regression: Numeric; r average, or local fit
- Learning is greedy; find the best split recursively → not always optimal
Classification Trees
- $p^i_m = N^i_m/N_m$
- Node m is pure if $p^i_m$ is 0 or 1
- Measure of impurity is entropy
Best split
- If node m is pure, generate a leaf and stop, otherwise split and continue recursively
- Find the variable and split that min impurity
Regression Trees
Pruning Trees
Multivariate Trees