Discover more from Daily Dose of Data Science
Make Sklearn KMeans 20x times faster
The KMeans algorithm is commonly used to cluster unlabeled data. But with large datasets, scikit-learn takes plenty of time to train and predict.
To speed-up KMeans, use Faiss by Facebook AI Research. It provides faster nearest-neighbor search and clustering.
Faiss uses "Inverted Index", an optimized data structure to store and index the data points. This makes performing clustering extremely efficient.
Additionally, Faiss provides parallelization and GPU support, which further improves the performance of its clustering algorithms.
Read more: GitHub.
Share this post on LinkedIn: Post Link.
Thanks for reading Daily Dose of Data Science! Subscribe for free to receive new posts and support my work.
Find the code for my tips here: GitHub.