From the course: Supervised Learning Essential Training

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Splitting data and limiting decision tree depth

Splitting data and limiting decision tree depth - Python Tutorial

From the course: Supervised Learning Essential Training

Start my 1-month free trial

Splitting data and limiting decision tree depth

- [Instructor] One of the biggest challenges in creating decision trees is overfitting. Overfitting is the biggest practical challenge in supervised learning and occurs when the model memorizes the training data and has difficulty predicting well on the test data. For decision trees, there are two ways to avoid overfitting. First, we can set constraints on tree size. And second, we can prune our decision trees. There are a few constraints we can set for our trees, such as the minimum samples for a leaf node, maximum depth of the tree or vertical depth, maximum number of leaf nodes, and maximum features to consider while searching for a best split. We can also prune our tree to create a robust model that generalizes well on new data. To do this, we first make our tree. Then we start at the bottom of the tree and remove sub-trees that don't improve classification accuracy. There are two pruning strategies to consider. First…

Contents