Parallelize Pandas Apply() With Swifter
The Pandas library has no inherent support to parallelize its operations. Thus, it always adheres to a single-core computation, even when other cores are idle.
Things get even worse when we use ๐๐ฉ๐ฉ๐ฅ๐ฒ(). In Pandas, ๐๐ฉ๐ฉ๐ฅ๐ฒ() is nothing but a glorified for-loop. As a result, it cannot even take advantage of vectorization.
A quick solution to parallelize ๐ฎ๐ฝ๐ฝ๐น๐() is to use ๐๐๐ถ๐ณ๐๐ฒ๐ฟ instead.
Swifter allows you to apply any function to a Pandas DataFrame in a parallelized manner. As a result, it provides considerable performance gains while preserving the old syntax. All you have to do is use ๐ฑ๐ณ.๐๐๐ถ๐ณ๐๐ฒ๐ฟ.๐ฎ๐ฝ๐ฝ๐น๐ instead of ๐ฑ๐ณ.๐ฎ๐ฝ๐ฝ๐น๐.
Read more here: Swifter Docs.
Read this post on LinkedIn: Post Link.
I like to explore, experiment and write about data science concepts and tools. You could connect with me on LinkedIn.