How to Read Multiple CSV Files Efficiently
In many situations, the data is often split into multiple CSV files and transferred to the DS/ML team for use.
As Pandas does not support parallelization, one has to iterate over the list of files and read them one by one for further processing.
"Datatable" can provide a quick fix for this. Instead of reading them iteratively with Pandas, you can use Datatable to read a bunch of files. Being parallelized, it provides a significant performance boost as compared to Pandas.
The performance gain is not just limited to I/O but is observed in many other tabular operations as well.
Read more here: DataTable Docs.
Thanks for reading Daily Dose of Data Science! Subscribe for free to learn something new and insightful about Python and Data Science every day. Also, get a Free Data Science PDF (250+ pages) with 200+ tips.
Read this post on LinkedIn: Post Link.
I like to explore, experiment, and write about data science concepts and tools. You could connect with me on LinkedIn.