Daily Dose of Data Science

Share this post

Avoid Using Pandas' Apply() Method At All Times

www.blog.dailydoseofds.com

Discover more from Daily Dose of Data Science

High-quality insights on Data Science and Python, along with best practices — shared daily. Get a 550+ Page Data Science PDF Guide and 450+ Practice Questions Notebook, FREE.
Over 36,000 subscribers
Continue reading
Sign in

Avoid Using Pandas' Apply() Method At All Times

Clearing a common misconception about a popular method.

Avi Chawla
Jun 24, 2023
11
Share this post

Avoid Using Pandas' Apply() Method At All Times

www.blog.dailydoseofds.com
1
Share

The apply() method in Pandas is the most common approach to apply a function along an axis of a DataFrame/Series.

But contrary to common belief, Pandas' apply() method:

  • is NOT vectorized

  • instead, it's a glorified for-loop

Thus, it does not offer any inherent optimization and the code runs at native Python speed.

One solution is to eliminate the apply() method by using a vectorized approach.

But it is understandable that at times, coming up with a vectorized approach is difficult. (Here’s one of my previous guides on this: If You Are Not Able To Code A Vectorized Approach, Try This)

Another solution is to parallelize the apply() method by using external libraries.

The image above compares the run-time of alternatives that support parallelization.

It is evident that Pandas’ apply() is not the optimal way to apply a method.

Get started with these libraries here:

  • Swifter: https://github.com/jmcarpenter2/swifter

  • Pandarallel: https://github.com/nalepae/pandarallel

  • Parallel Pandas: https://pypi.org/project/parallel-pandas/

  • Mapply: https://pypi.org/project/mapply/

👉 Over to you: What are some other techniques you commonly use to optimize Pandas’ operations?

Thanks for reading Daily Dose of Data Science! Subscribe for free to learn something new and insightful about Python and Data Science every day. Also, get a Free Data Science PDF (350+ pages) with 250+ tips.


👉 Read what others are saying about this post on LinkedIn and Twitter.

👉 Tell the world what makes this newsletter special for you by leaving a review here :)

Review Daily Dose of Data Science

👉 If you liked this post, don’t forget to leave a like ❤️. It helps more people discover this newsletter on Substack and tells me that you appreciate reading these daily insights. The button is located towards the bottom of this email.

👉 If you love reading this newsletter, feel free to share it with friends!

Share Daily Dose of Data Science

👉 Sponsor the Daily Dose of Data Science Newsletter. More info here: Sponsorship details.


Find the code for my tips here: GitHub.

I like to explore, experiment and write about data science concepts and tools. You can read my articles on Medium. Also, you can connect with me on LinkedIn and Twitter.

11
Share this post

Avoid Using Pandas' Apply() Method At All Times

www.blog.dailydoseofds.com
1
Share
Previous
Next
1 Comment
Share this discussion

Avoid Using Pandas' Apply() Method At All Times

www.blog.dailydoseofds.com
Joe Corliss
Jul 10

The series.transform method can handle lots of the operations you might be in the habit of using .apply for, and it's faster

Expand full comment
Reply
Share
Top
New
Community

No posts

Ready for more?

© 2023 Avi Chawla
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing