Speed-up Parquet I/O of Pandas by 5x
Dataframes are often stored in parquet files and read using Pandas' 𝐫𝐞𝐚𝐝_𝐩𝐚𝐫𝐪𝐮𝐞𝐭() method.
Rather than using Pandas, which relies on a single-core, use fastparquet. It offers immense speedups for I/O on parquet files using parallel processing.
Find more info here: Docs.
Share this post on LinkedIn: Post Link.
Thanks for reading Daily Dose of Data Science! Subscribe for free to learn something new about Python and Data Science every day.
Find the code for my tips here: GitHub.