Discover more from Daily Dose of Data Science
Feature Scaling is NOT Always Necessary
Here's when it is not needed.
Feature scaling is commonly used to improve the performance and stability of ML models.
This is because it scales the data to a standard range. This prevents a specific feature from having a strong influence on the model’s output.
For instance, in the image above, the scale of Income could massively impact the overall prediction. Scaling both features to the same range can mitigate this and improve the model’s performance.
But is it always necessary?
While feature scaling is often crucial, knowing when to do it is also equally important.
Note that many ML algorithms are unaffected by scale. This is evident from the image below.
As shown above:
Logistic regression, SVM Classifier, MLP, and kNN do better with feature scaling.
Decision trees, Random forests, Naive bayes, and Gradient boosting are unaffected.
Consider a decision tree, for instance. It splits the data based on thresholds determined solely by the feature values, regardless of their scale.
Thus, it’s important to understand the nature of your data and the algorithm you intend to use.
You may never need feature scaling if the algorithm is insensitive to the scale of the data.
👉 Over to you: What other algorithms typically work well without scaling data? Let me know :)
Thanks for reading Daily Dose of Data Science! Subscribe for free to learn something new and insightful about Python and Data Science every day. Also, get a Free Data Science PDF (350+ pages) with 250+ tips.
👉 If you liked this post, don’t forget to leave a like ❤️. It helps more people discover this newsletter on Substack and tells me that you appreciate reading these daily insights. The button is located towards the bottom of this email.
Thanks for reading!
Whenever you’re ready, here are a couple of more ways I can help you:
Get the full experience of the Daily Dose of Data Science. Every week, receive two curiosity-driven deep dives that:
Make you fundamentally strong at data science and statistics.
Help you approach data science problems with intuition.
Teach you concepts that are highly overlooked or misinterpreted.
Promote to over 28,000 subscribers by sponsoring this newsletter.
👉 Tell the world what makes this newsletter special for you by leaving a review here :)
👉 If you love reading this newsletter, feel free to share it with friends!