Daily Dose of Data Science

Share this post

This Small Tweak Can Significantly Boost The Run-time of KMeans

www.blog.dailydoseofds.com

This Small Tweak Can Significantly Boost The Run-time of KMeans

KMeans++: KMeans with a smarter centroid initialization approach.

Avi Chawla
Mar 9, 2023
8
Share
Share this post

This Small Tweak Can Significantly Boost The Run-time of KMeans

www.blog.dailydoseofds.com

KMeans is a popular but high-run-time clustering algorithm. Here's how a small tweak can significantly improve its run time.

KMeans selects the initial centroids randomly. As a result, it fails to converge at times. This requires us to repeat clustering several times with different initialization.

Instead, KMeans++ takes a smarter approach to initialize centroids. The first centroid is selected randomly. But the next centroid is chosen based on the distance from the first centroid.

In other words, a point that is away from the first centroid is more likely to be selected as an initial centroid. This way, all the initial centroids are likely to lie in different clusters already and the algorithm may converge faster.

The illustration below shows the centroid initialization of KMeans++:

Thanks for reading Daily Dose of Data Science! Subscribe for free to learn something new about Python and Data Science every day.

👉 See what others are saying about this post on LinkedIn: Post Link.

👉 If you love reading this newsletter, feel free to share it with friends!

Share Daily Dose of Data Science


Check out Sourcery, an automated code refactoring tool for Python to make your code more elegant, concise, and pythonic.

Find the code for my tips here: GitHub.

I like to explore, experiment and write about data science concepts and tools. You can read my articles on Medium. Also, you can connect with me on LinkedIn.

8
Share
Share this post

This Small Tweak Can Significantly Boost The Run-time of KMeans

www.blog.dailydoseofds.com
Previous
Next
Comments
Top
New
Community

No posts

Ready for more?

© 2023 Avi Chawla
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing