Daily Dose of Data Science

Share this post

Feature Tracking Made Simple In Sklearn Transformers

www.blog.dailydoseofds.com

Feature Tracking Made Simple In Sklearn Transformers

Avi Chawla
Nov 7, 2022
2
Share

Recently, scikit-learn announced the release of one of the most awaited improvements. In a gist, sklearn can now be configured to output Pandas DataFrames.

Until now, Sklearn's transformers were configured to accept a Pandas DataFrame as input. But they always returned a NumPy array as an output. As a result, the output had to be manually projected back to a Pandas DataFrame. This, at times, made it difficult to track and assign names to the features.

For instance, consider the snippet above.

In ๐—ป๐˜‚๐—บ๐—ฝ๐˜†_๐—ผ๐˜‚๐˜๐—ฝ๐˜‚๐˜.๐—ฝ๐˜†, it is tricky to infer the name (or computation) of a column by looking at the NumPy array.

However, in the upcoming release, the transformer can return a Pandas DataFrame (๐—ฝ๐—ฎ๐—ป๐—ฑ๐—ฎ๐˜€_๐—ผ๐˜‚๐˜๐—ฝ๐˜‚๐˜.๐—ฝ๐˜†). This makes tracking feature names incredibly simple.

P.S. The feature is in dev and will be rolled out soon!

Read more: Release page.

Thanks for reading Daily Dose of Data Science! Subscribe for free to learn something new and insightful about Python and Data Science every day. Also, get a Free Data Science PDF (250+ pages) with 200+ tips.


I like to explore, experiment and write about data science concepts and tools. You could connect with me on LinkedIn.

2
Share
Previous
Next
Comments
Top
New
Community

No posts

Ready for more?

ยฉ 2023 Avi Chawla
Privacy โˆ™ Terms โˆ™ Collection notice
Start WritingGet the app
Substackย is the home for great writing