Parallelize Pandas Apply() With Swifter

Nov 01, 2022

The Pandas library has no inherent support to parallelize its operations. Thus, it always adheres to a single-core computation, even when other cores are idle.

Things get even worse when we use 𝐚𝐩𝐩𝐥𝐲(). In Pandas, 𝐚𝐩𝐩𝐥𝐲() is nothing but a glorified for-loop. As a result, it cannot even take advantage of vectorization.

A quick solution to parallelize 𝗮𝗽𝗽𝗹𝘆() is to use 𝘀𝘄𝗶𝗳𝘁𝗲𝗿 instead.

Swifter allows you to apply any function to a Pandas DataFrame in a parallelized manner. As a result, it provides considerable performance gains while preserving the old syntax. All you have to do is use 𝗱𝗳.𝘀𝘄𝗶𝗳𝘁𝗲𝗿.𝗮𝗽𝗽𝗹𝘆 instead of 𝗱𝗳.𝗮𝗽𝗽𝗹𝘆.

Daily Dose of Data Science

Discussion about this post