Speed-up Parquet I/O of Pandas by 5x
Dataframes are often stored in parquet files and read using Pandas' 𝐫𝐞𝐚𝐝_𝐩𝐚𝐫𝐪𝐮𝐞𝐭() method.
Rather than using Pandas, which relies on a single-core, use fastparquet. It offers immense speedups for I/O on parquet files using parallel processing.
Find more info here: Docs.
Share this post on LinkedIn: Post Link.
Thanks for reading Daily Dose of Data Science! Subscribe for free to learn something new about Python and Data Science every day.
Find the code for my tips here: GitHub.
I like to explore, experiment and write about data science concepts and tools. You can read my articles on Medium. Also, you can connect with me on LinkedIn.