Discover more from Daily Dose of Data Science
Why Are We Typically Advised To Never Iterate Over A DataFrame?
The core reason which very few told you about.
From time to time, we are advised to avoid iterating on a Pandas DataFrame. But what is the exact reason behind this? Let me explain.
A DataFrame is a column-major data structure. Thus, consecutive elements in a column are stored next to each other in memory.
As processors are efficient with contiguous blocks of memory, retrieving a column is much faster than a row.
But while iterating, as each row is retrieved by accessing non-contiguous blocks of memory, the run-time increases drastically.
In the image above, retrieving over 32M elements of a column was still over 20x faster than fetching just nine elements stored in a row.
Thanks for reading Daily Dose of Data Science! Subscribe for free to learn something new about Python and Data Science every day.
👉 See what others are saying about this post on LinkedIn: Post Link.
👉 If you love reading this newsletter, feel free to share it with friends!
Find the code for my tips here: GitHub.