A (Highly) Important Point to Consider Before…

Avi Chawla

Mar 31, 2023

Estimating the success rate of KMeans.

Read →

5 Comments

Francois Ascani

Mar 31, 2023Liked by Avi Chawla

Very nice short empirical analysis. Questions:

1. Here, I assume you assume you knew the actual number of clusters. Does kmeans++ still succeed to converge most of the time if the number of clusters used is over or under the actual number?

2. In your example, all the clusters are equally distributed in space which makes it perfectly adequate for the method to initialize the centroids as those will also be well distributed in space. What happens when the clusters are not well distributed in spaces, for instance if you have a group of clusters in one side of the space well separated from another group of clusters. My intuition is that even kmeans++ will not able to converge most of the time in that case as well. Have you tried this?

Thanks for your newsletter. I enjoy reading it!

Expand full comment

Reply (1)

Daily Dose of Data Science

A (Highly) Important Point to Consider Before…