Daily Dose of Data Science

Share this post

Why You Should Not Read CSVs with Pandas

www.blog.dailydoseofds.com

Why You Should Not Read CSVs with Pandas

Avi Chawla
Oct 6, 2022
27
3
Share

Pandas adheres to a single-core computation, which makes its operations extremely inefficient, especially on large datasets.

The "datatable" library in Python is an excellent alternative with a Pandas-like API. Its multi-threaded data processing support makes it faster than Pandas.

The snippet demonstrates the run-time comparison of creating a "Pandas DataFrame" from a CSV using Pandas and Datatable.

Thanks for reading Daily Dose of Data Science! Subscribe for free to learn something new and insightful about Python and Data Science every day. Also, get a Free Data Science PDF (250+ pages) with 200+ tips.

27
3
Share
Previous
Next
3 Comments
AJawake
Dec 5, 2022

Does this library work similar to 'data.table' library from R?

Expand full comment
Reply
1 reply by Avi Chawla
AndresVeraF
Dec 3, 2022

Thanks, very useful

Expand full comment
Reply
1 more comment…
Top
New
Community

No posts

Ready for more?

© 2023 Avi Chawla
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing