The Limitations of DBSCAN Clustering Which Many Often Overlook

...And here's a better alternative to work with.

Avi Chawla

Oct 04, 2023

DBSCAN is a density-based clustering, which clusters data points based on density.

This makes it more robust than algorithms like KMeans because:

Being “density-based”, it can identify clusters of varying shapes.
While KMeans can only create globular clusters.

A comparison between DBSCAN and KMeans is shown below:

KMeans vs. DBSCAN on non-globular clustering datasets

KMeans attempts to form globular clusters. Hence, it fails to identify correct clusters.
DBSCAN relies on the concept of “density”, making it more robust.

Yet, it is important to note that DBSCAN also has some limitations, which many often overlook.

Let’s understand these today.

To begin, DBSCAN assumes that the local density of data points is (somewhat) globally uniform. This is governed by its eps parameter.

Thus, it may struggle to identify clusters with varying densities.

This may need several hyperparameter tuning attempts to get promising results.

HDBSCAN can be a better choice for density-based clustering.

It relaxes the assumption of local uniform density, which makes it more robust to clusters of varying densities by exploring many different density scales.

Its effectiveness is evident from the image below:

On a dataset with three clusters, each with varying densities:

DBSCAN struggles to identify correct clusters.
HDBSCAN is found to be more robust.

What's more:

DBSCAN is a scale variant algorithm. Thus, clustering results for data X, 2X, 3X, etc., can be entirely different.
On the other hand, HDBSCAN is scale-invariant. So, clustering results remain the same across different scales of data.

We can also verify this experimentally:

Clustering results differ with the scale of data for DBSCAN

The results vary with the data scale for DBSCAN.
The results remain unaltered with the scale for HDBSCAN.

👉 Over to you: Can you explain why HDBSCAN is scale-invariant?

👉 If you liked this post, don’t forget to leave a like ❤️. It helps more people discover this newsletter on Substack and tells me that you appreciate reading these daily insights. The button is located towards the bottom of this email.

Thanks for reading!