π²

# Tree-Based Methods

### Contents

(This page is based primarily on material from Chapter 8 of Introduction to Statistical Learning)
Β

### Decision Trees

#### Regression Trees

Overall Algorithm:
Preliminaries: value of , , set of values to test.
1. Use recursive binary splitting until every terminal node has fewer than observations.
1. Apply cost complexity pruning to the full tree for each to get a series of candidate trees.
1. Use K-fold CV across a range of values. For :
1. Repeat (1.) & (2.) leaving out the fold.
2. Evaluate the MSE of each CV candidate tree on the fold.
1. Select the candidate tree from (2.) with an value corresponding to the best average MSE across each (3.2) step.
Β
Recursive Binary Splitting Algorithm:
Given data , we perform repeated greedy splits of the feature space to form our decision tree.
These splits select a feature and partition the data around value for that feature, creating two regions:
The predictions for each region are simply the mean of the corresponding values.
We seek the values of and that minimise .
Cost Complexity Pruning Algorithm:
Given some , return the subtree that minimises:
Where is the number of leaf nodes in a given subtree.

#### Classification Trees

This is essentially the same as regression trees, but we can no longer use the RSS to assess the loss in a given region.
The alternatives are:
(where is the proportion of observations in the region from class )
Classification error rate:
Gini index:
Cross-entropy:

### Bagging

Bootstrapping in the context of decision trees is known as bagging. In this context, note:
• We no longer prune individual trees
• Using a very large number of trees does not overfit
• We can estimate the test error without having to use cross-validation:
The latter is known as out-of-bag error estimation.
The out-of-bag observations are the observations not sampled in each iteration (on average, about of the dataset assuming "bag size"=n). For each observation we can average the predictions of the models for which the observation is OOB, and then calculate the overall OOB MSE across each.

### Random Forest

This method addresses the following problem with bagging decision trees:
• The order on which predictors are split is heavily correlated between trees, especially early on when strong predictors stand out consistently.
• This correlation makes the reduction in variance caused by bootstrapping less effective.
To address this, we do the following:
• At each split we randomly select out of the predictors to assess
• Typically we choose

### Boosting

This method, like bootstrapping, can be used in a variety of domains. In this context, we do the following:
Preliminaries: set
1. Set
1. For :
1. Fit tree with terminal nodes to the training data, with target (not ).
2. Update function :
3. Update value :
1. Return
Note:
• This approach learns slowly by gradually reducing the residuals (typically a good thing!)
• This approach, unlike RF, can overfit, but only still quite robust to overfitting
• We can use CV to select
• Very low , e.g. , can be the best model