Decision Trees

A decision tree is a tree-like graph with nodes representing the place where we pick an attribute and ask a question; edges represent the answers to the question; and the leaves represent the actual output or class label. They are used in non-linear decision making with simple linear decision surfaces. Decision trees classify the examples by sorting them down the tree from the root to some leaf node, with the leaf node providing the classification to the example. Each node in the tree acts as a test case for some attribute, and each edge descending from that node corresponds to one of the possible answers to the test case. This process is recursive in nature and is repeated for every subtree rooted at the new nodes.

Making Predictions

After the decision tree model is fit, you can start at the root node. This asks for a variety of questions and moves to the left and right node until you reach the last node

Gini Impurity

A node calculates the gini impurity of an instance to see how pure it is. If it is pure then the gini score is 0

Gini

Estimating class probabilities: Decision trees can also estimate the probability that a particular instance belongs to a class ‘k’.


CART Training algorithm

Sklearn uses a Classification And Regression Tree (CART) to make the decision trees. The cost function CART uses is

The CART algorithm is greedy in nature

Gini - Impurity and Entropy

Decision Trees in sklearn uses gini impurity by default but you can also choose to use Gini - Entropy by setting the hyperparameter appropriate. Most of the time impurity and entropy does not make a difference. Gini impurity is slightly faster to compute. Impurity tends to isolate the most frequent class and its own branch of the tree, while entropy tends to produce slightly more balanced trees.

Regularization Parameters

If left unconstrained the tree structure will adapt to the training data, fitting it very closely and most likely to overfitting it. Such types of models are called non parametric model not because it does not have any parameters but because the parameters are not determined prior to training. A parametric model such as a linear model has a predefined number of parameters, thus limiting it’s degree of freedom, reducing its risk of overfitting. The DecisionTreeClassifier in sklearn has a few other parameters that similarly restrict the shape of the tree. Such as :