# Why Decision Trees Must Be Thoroughly Inspected After Training

### Understanding the structural formulation of decision trees and how it can be problematic at times.

If we were to visualize the decision rules (the conditions evaluated at every node) of ANY decision tree, we would ALWAYS find them to be perpendicular to the feature axes, as depicted below:

In other words, every decision tree progressively segregates feature space based on such perpendicular boundaries to split the data.

Of course, this is not a “problem” per se.

In fact, this perpendicular splitting is what makes it so powerful to perfectly overfit any dataset (read the overfitting experiment section here to learn more).

However, this also brings up a pretty interesting point that is often overlooked when fitting decision trees.

More specifically, what would happen if our dataset had a diagonal decision boundary, as depicted below:

It is easy to guess that in such a case, the decision boundary learned by a decision tree is expected to appear as follows:

In fact, if we plot this decision tree, we notice that it creates so many splits just to fit this easily separable dataset, which a model like logistic regression, support vector machine (SVM), or even a small neural network can easily handle:

It becomes more evident if we zoom into this decision tree and notice how close the thresholds of its split conditions are:

This is a bit concerning because it clearly shows that the decision tree is meticulously trying to mimic a diagonal decision boundary, which hints that it might not be the best model to proceed with.

**To double-check this, I often do the following:**

Take the training data

`(X, y)`

;Shape of

`X`

:`(n, m)`

.Shape of

`y`

:`(n, 1)`

.

Run PCA on

`X`

to project data into an orthogonal space of`m`

dimensions. This will give`X_pca`

, whose shape will also be`(n, m)`

.Fit a decision tree on

`X_pca`

and visualize it (*thankfully, decision trees are always visualizable*).If the decision tree depth is significantly smaller in this case, it validates that there is a diagonal separation.

For instance, the PCA projections on the above dataset are shown below:

It is clear that the decision boundary on PCA projections is **almost** perpendicular to the `X2``

feature (the 2nd principal component).

Fitting a decision tree on this `X_pca`

drastically reduces its depth, as depicted below:

This lets us determine that we might be better off using some other algorithm instead.

Or, we can spend some time engineering better features that the decision tree model can easily work with using its perpendicular data splits.

At this point, if you are thinking, why can’t we use the decision tree trained on `X_pca`

?

While nothing stops us from doing that, do note that PCA components are not interpretable, and maintaining feature interpretability can be important at times.

Thus, whenever you train your next decision tree model, consider spending some time inspecting what it’s doing.

Before I end…

Through this post, I don’t intend to discourage the use of decision trees. They are the building blocks of some of the most powerful ensemble models we use today.

My point is to bring forward the structural formulation of decision trees and why/when they might not be an ideal algorithm to work with.

That said, do you know that decision tree models can be supercharged with tensor computations for up to **40x faster inference**? Check out this article to learn more: **Sklearn Models are Not Deployment Friendly! Supercharge Them With Tensor Computations**.

Also, if you want to learn how PCA works end-to-end, along with its entire mathematical details, check out this article: **Formulating the Principal Component Analysis (PCA) Algorithm From Scratch**.

**👉 **Over to you: What other ways might you use to handle diagonal datasets with decision tree models?

**👉 If you liked this post, don’t forget to leave a like ❤️. It helps more people discover this newsletter on Substack and tells me that you appreciate reading these daily insights.**

**The button is located towards the bottom of this email.**

Thanks for reading!

**Latest full articles**

If you’re not a full subscriber, here’s what you missed last month:

Don’t Stop at Pandas and Sklearn! Get Started with Spark DataFrames and Big Data ML using PySpark.

DBSCAN++: The Faster and Scalable Alternative to DBSCAN Clustering

Federated Learning: A Critical Step Towards Privacy-Preserving Machine Learning

You Cannot Build Large Data Projects Until You Learn Data Version Control!

Deploy, Version Control, and Manage ML Models Right From Your Jupyter Notebook with Modelbit

To receive all full articles and support the Daily Dose of Data Science, consider subscribing:

**👉 Tell the world what makes this newsletter special for you by leaving a review here :)**

👉 If you love reading this newsletter, feel free to share it with friends!