Discover more from Daily Dose of Data Science
Parallelize Pandas Apply() With Swifter
The Pandas library has no inherent support to parallelize its operations. Thus, it always adheres to a single-core computation, even when other cores are idle.
Things get even worse when we use 𝐚𝐩𝐩𝐥𝐲(). In Pandas, 𝐚𝐩𝐩𝐥𝐲() is nothing but a glorified for-loop. As a result, it cannot even take advantage of vectorization.
A quick solution to parallelize 𝗮𝗽𝗽𝗹𝘆() is to use 𝘀𝘄𝗶𝗳𝘁𝗲𝗿 instead.
Swifter allows you to apply any function to a Pandas DataFrame in a parallelized manner. As a result, it provides considerable performance gains while preserving the old syntax. All you have to do is use 𝗱𝗳.𝘀𝘄𝗶𝗳𝘁𝗲𝗿.𝗮𝗽𝗽𝗹𝘆 instead of 𝗱𝗳.𝗮𝗽𝗽𝗹𝘆.
Read more here: Swifter Docs.
Read this post on LinkedIn: Post Link.
Thanks for reading Daily Dose of Data Science! Subscribe for free to learn something new and insightful about Python and Data Science every day. Also, get a Free Data Science PDF (250+ pages) with 200+ tips.
I like to explore, experiment and write about data science concepts and tools. You could connect with me on LinkedIn.