Daily Dose of Data Science

Share this post

A Reliable and Efficient Technique To Measure Feature Importance

www.blog.dailydoseofds.com

Discover more from Daily Dose of Data Science

High-quality insights on Data Science and Python, along with best practices — shared daily. Get a 550+ Page Data Science PDF Guide and 450+ Practice Questions Notebook, FREE.
Over 36,000 subscribers
Continue reading
Sign in

A Reliable and Efficient Technique To Measure Feature Importance

Measure feature importance through chaos.

Avi Chawla
Jun 6, 2023
10
Share this post

A Reliable and Efficient Technique To Measure Feature Importance

www.blog.dailydoseofds.com
Share

Here's a neat technique to quickly and reliably measure feature importance in any ML model.

Permutation feature importance observes how randomly shuffling a feature influences model performance.

Essentially, after training a model, we do the following:

  • Measure model performance (A1) on the given dataset (test/validation/train).

  • Shuffle one feature randomly.

  • Measure performance (A2) again.

  • Feature importance = (A1-A2).

  • Repeat for all features.

To eliminate any potential effects of randomness during feature shuffling, it is also recommended to shuffle the same feature multiple times.

Benefits of permutation feature importance:

  • No repetitive model training.

  • The technique is pretty reliable.

  • It can be applied to any model, as long as you can evaluate the performance.

  • It is efficient

Of course, there is one caveat to this approach.

Say two features are highly correlated and one of them is permuted/shuffled. In this case, the model will still have access to the feature through its correlated feature.

This will result in a lower importance value for both features.

One way to handle this is to cluster features that are highly correlated and only keep one feature from each cluster.

Here’s one of my previous guides on making this task easier: The Limitations Of Heatmap That Are Slowing Down Your Data Analysis.

👉 Over to you: What other reliable feature importance techniques do you use frequently?

Thanks for reading!

Thanks for reading Daily Dose of Data Science! Subscribe for free to learn something new and insightful about Python and Data Science every day. Also, get a Free Data Science PDF (350+ pages) with 250+ tips.

👉 If you liked this post, don’t forget to leave a like ❤️. It helps more people discover this newsletter on Substack and tells me that you appreciate reading these daily insights. The button is located towards the bottom of this email.


Latest full articles

If you’re not a full subscriber, here’s what you missed last month:

  • Model Compression: A Critical Step Towards Efficient Machine Learning.

  • Generalized Linear Models (GLMs): The Supercharged Linear Regression.

  • Gaussian Mixture Models (GMMs): The Flexible Twin of KMeans.

  • Bayesian Optimization for Hyperparameter Tuning.

  • Formulating the PCA Algorithm From Scratch.

  • Where Did The Assumptions of Linear Regression Originate From?

To receive all full articles and support the Daily Dose of Data Science, consider subscribing:

I want to read full articles.


👉 Tell the world what makes this newsletter special for you by leaving a review here :)

Review Daily Dose of Data Science

👉 If you love reading this newsletter, feel free to share it with friends!

10
Share this post

A Reliable and Efficient Technique To Measure Feature Importance

www.blog.dailydoseofds.com
Share
Previous
Next
Comments
Top
New
Community

No posts

Ready for more?

© 2023 Avi Chawla
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing