From the course: Machine Learning with Scikit-Learn

Decision trees using scikit-learn - scikit-learn Tutorial

From the course: Machine Learning with Scikit-Learn

Start my 1-month free trial

Decision trees using scikit-learn

- [Instructor ] One of the most important considerations when choosing a machine learning algorithm is how interpretable it is. The ability to explain how an algorithm makes predictions is useful not only to you, but also to potential stakeholders. A very interpretable machine learning algorithm is a decision tree. You can think of it as a series of questions, designed to assign a class or predict a continuous value depending on the task. Example image is a decision tree, designed for classification. So you have a flower with the following feature, petal length of 4.5 centimeters. The way decision trees work is you start at the top of the tree and ask questions until you reach these leafy green nodes. You first has the question is 4.5 centimeters less than or equal to 2.45? This is false, so going this other node. Is 4.5 centimeters less than or equal to 4.95? This is true, so you'll end up with a leaf node. Leaf nodes our predictions are assigned. In this leaf node, there were 38 versicolor and virginica. Their prediction for this leaf node is versicolor, as is the majority class. In this video, I'll share with you how you can create intuitive decision tree using Scikit-learn. The first thing you have to do is import the libraries that you're going to use. In this case, you'll import Matplotlib, pandas, the Iris data set, as well as train-test split, and decision tree classifier. This next piece of code loads the Iris data set. From here, you can split your data into training and test sets. What this image shows, is which variable the data from the day from df one, two for particular train test split. This is a really important step as oftentimes this decision trees overfit the training set. Train test split will help you avoid that. It's also important to note that another benefit of decision trees is that you don't have to standardize your features. This is different from other algorithms like logistic regression, and K nearest neighbors. From here, you can create a decision tree model. This model is already imported earlier in notebook so it's commented out. The next step is to make an instance of your model. This is normally a place where you can tune the hyper parameters of the model. The code below constraints the model to have at most a depth of two. Tree depth is a measure of how many splits it makes before coming to a prediction. It's important to note that max_depth is not always equal to depth. Max_depth is simply something that pre-prunes a tree to only grow at most discerned depth. From here, you can train your model. I can also make predictions. You can also measure your model performance. This notebook uses accuracy as the metric, which is a fraction of correct predictions. This section shows how to tune max_depth. If you look at the graph, you'll see a couple of things. The first is that accuracy increases up to a certain max depth. There could be a couple reasons for this. One potential reason is that max_depth is not necessarily equal to depth. It's possible that trees with max_depth four and five have the same depth. It could also be that after a certain point, the models not getting any more useful information after a certain depth. So that's it, I encourage you to create a decision tree of your own.

Contents