The Most Common Misconception About Inplace Operations in Pandas
...and here's what happens in reality.
Pandas users often modify a DataFrame inplace expecting better performance. Yet, it may not always be efficient. Here's why.
The image compares the run-time of inplace and non-in-place operations. In most cases, inplace operations are slow.
Contrary to common belief, most inplace operations DO NOT prevent the creation of a new copy. It is just that inplace assigns the copy back to the same address.
But during this assignment, Pandas performs some extra checks (SettingWithCopy) to ensure that the DataFrame is being modified correctly. This, at times, can be an expensive operation.
Yet, in general, there is no guarantee that an inplace operation is faster.
What’s more, inplace operations do not allow chaining multiple operations, such as this:
Thanks for reading Daily Dose of Data Science! Subscribe for free to learn something new and insightful about Python and Data Science every day. Also, get a Free Data Science PDF (250+ pages) with 200+ tips.
👉 Read what others are saying about this post on LinkedIn and Twitter.
👉 If you liked this post, don’t forget to leave a like ❤️. It helps more people discover this newsletter on Substack and tells me that you appreciate reading these daily insights. The button is located towards the bottom of this email.
👉 If you love reading this newsletter, feel free to share it with friends!
Find the code for my tips here: GitHub.
I like to explore, experiment and write about data science concepts and tools. You can read my articles on Medium. Also, you can connect with me on LinkedIn and Twitter.
Big fan of method chaining. To achieve the same level of readability as the inplace=True, usually nest the chain in parentheses.