Interpret the base learner in Gradient Boosting Classifier

AI Maverick
2 min readMar 12, 2022

--

Gradient Boosting Classifier is a well-known ensemble model developed in different libraries such as Sklearn and XGBoost.

| kaggle | GitHub | Buy a coffee |

The base learner in the Sklearn model is Decision Regressor Tree which has its attributes. After training the model, you can use the base learner’s methods as well.
First, we have to import the model and train it over the dataset.

In the following, I used a predefined dataset from the Sklearn library.

import sklearn.datasets as dtX, y = dt.load_iris(return_X_y=True)

To train the model and evaluate it, one can use the stratified sampling method to have the same distribution in both training and test batches.

from sklearn.model_selection import train_test_splitx_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

Now, it is time to build the Gradient Boosting Model from the Sklearn library.

from sklearn.ensemble import GradientBoostingClassifiermodel = GradientBoostingClassifier()model.fit(x_train, y_train)model.score(x_test, y_test)

As the focus of this review is the base learner, not the training of the ML model process, I used the default hyperparameters and trained it simply on the training batch, and tested it on the test.

Tree

So, where are the base learners and their attributes? For this case study, we have (100, 3) trees which means 100 trees for each class label. So, how to check the tree? You can see that all trees are stored in the following method.

model.estimators_
print(model.estimators_.shape)
>>> (100, 3)

The first tree for the first-class label is;

model.estimators_[0][0]>>> DecisionTreeRegressor(criterion='friedman_mse', max_depth=3,                       random_state=RandomState(MT19937) at 0x21628DFAA40)

the .tree_ returns the trained tree

model.estimators_[0][0].tree_>>> <sklearn.tree._tree.Tree at 0x2166c7c2f80>

Draw the tree

You can also draw your tree with the following code

from sklearn.tree import plot_treetree = model.estimators_.reshape(-1)[1]plot_tree(tree)
Built trained second tree for the second class label of the iris dataset

Leaves

Another question you might ask s that, what about the number of leaves.
Your answer is the following code.

tree.tree_.n_leaves
>>> 5

The summation of all leaves would be

print(sum(tree.tree_.n_leaves for tree in model.estimators_.reshape(-1)))

Tree’s Value

Returning the tree value by the following code

tree = model.estimators_[1][0]tree.tree_.value>>> array([[[-3.09980500e-05]],         [[ 1.61886012e+00]],         [[-3.06927659e-01]],         [[-3.08667222e-01]],         [[-9.63116480e-01]],         [[-9.82423324e-01]],         [[-3.05809369e-01]],         [[-9.62883563e-01]],         [[-9.59686381e-01]]])

Likewise, we know that we have nine nodes and also you can count it

tree.node_count

In the following, I stored the values, total nodes, right and left child’s nodes, features, and root nodes for the first tree of the first class.

tree = model.estimators_[0, 0].tree_values = tree.valuenodes = tree.node_countleft_child = tree.children_leftright_child = tree.children_rightfeature = tree.feature

Support the Author

To support the Author, you can buy a coffee for him.

| Buy a coffee |

--

--

No responses yet