Daily Dose of Data Science

Share this post

The Limitation Of Silhouette Score Which Is Often Ignored By Many

www.blog.dailydoseofds.com

Discover more from Daily Dose of Data Science

High-quality insights on Data Science and Python, along with best practices — shared daily. Get a 550+ Page Data Science PDF Guide and 450+ Practice Questions Notebook, FREE.
Over 36,000 subscribers
Continue reading
Sign in

The Limitation Of Silhouette Score Which Is Often Ignored By Many

Not all clustering results are convex.

Avi Chawla
Jul 8, 2023
13
Share this post

The Limitation Of Silhouette Score Which Is Often Ignored By Many

www.blog.dailydoseofds.com
3
Share

Silhouette score is commonly used for evaluating clustering results.

At times, it is also preferred in place of the elbow curve to determine the optimal number of clusters. (I have covered this before if you wish to recap or learn more).

However, while using the Silhouette score, it is also important to be aware of one of its major shortcomings.

The Silhouette score is typically higher for convex (or somewhat spherical) clusters.

However, using it to evaluate arbitrary-shaped clustering can produce misleading results.

This is also evident from the following image:

While the clustering output of KMeans is worse, the Silhouette score is still higher than Density-based clustering.

DBCV — density-based clustering validation is a better metric in such cases.

As the name suggests, it is specifically meant to evaluate density-based clustering.

Simply put, DBCV computes two values:

  • The density within a cluster

  • The density between clusters

A high density within a cluster and a low density between clusters indicates good clustering results.

DBCV can also be used when you don’t have ground truth labels.

This adds another metric to my recently proposed methods: Evaluate Clustering Performance Without Ground Truth Labels.

The effectiveness of DBCV is also evident from the image below:

This time, the score for the clustering output of KMeans is worse, and that of density-based clustering is higher.

Get started with DBCV here: GitHub.

👉 Over to you: What are some other ways to evaluate clustering where traditional metrics may not work?

👉 If you liked this post, don’t forget to leave a like ❤️. It helps more people discover this newsletter on Substack and tells me that you appreciate reading these daily insights. The button is located towards the bottom of this email.

Thanks for reading Daily Dose of Data Science! Subscribe for free to learn something new and insightful about Python and Data Science every day. Also, get a Free Data Science PDF (350+ pages) with 250+ tips.


👉 Read what others are saying about this post on LinkedIn and Twitter.

👉 Tell the world what makes this newsletter special for you by leaving a review here :)

Review Daily Dose of Data Science

👉 If you love reading this newsletter, feel free to share it with friends!

Share Daily Dose of Data Science

👉 Sponsor the Daily Dose of Data Science Newsletter. More info here: Sponsorship details.


Find the code for my tips here: GitHub.

I like to explore, experiment and write about data science concepts and tools. You can read my articles on Medium. Also, you can connect with me on LinkedIn and Twitter.

13
Share this post

The Limitation Of Silhouette Score Which Is Often Ignored By Many

www.blog.dailydoseofds.com
3
Share
Previous
Next
3 Comments
Share this discussion

The Limitation Of Silhouette Score Which Is Often Ignored By Many

www.blog.dailydoseofds.com
Joe Corliss
Jul 9Liked by Avi Chawla

Well, of course density-based clustering validation gives a higher score to a density-based clustering algorithm. For this example, we only know that DBCV gives a better clustering because it's obvious from plotting the data. In ten dimensions, how do you know which algorithm and which metric will give the best result?

Expand full comment
Reply
Share
1 reply by Avi Chawla
Noberto Maciel
Sep 5

Hi Jean, where did you find out about this disadvantage of the Silhouette, in what article? I need to cite this in my dissertation.

Expand full comment
Reply
Share
1 more comment...
Top
New
Community

No posts

Ready for more?

© 2023 Avi Chawla
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing