A Common Misconception About Model Reproducibility

...And here's what reproducibility truly means.

Avi Chawla

Aug 04, 2023

Today I want to discuss something extremely important about ML model reproducibility.

Imagine you trained an ML model, say a neural network.

It gave a training accuracy of 95% and a test accuracy of 92%.

You trained the model again and got the same performance.

Will you call this a reproducible experiment?

Think for a second before you read further.

Well, contrary to common belief, this is not what reproducibility means.

To understand better, consider this illustration:

Here, we feed the input data to neural networks with the same architecture but different randomizations. Next, we visualize the transformation using a 2D dummy layer, as I depicted in one of my previous posts below:

Data transformation in a neural network (**Post Link**)

All models separate the data pretty well and give 100% accuracy, don’t they?

Yet, if you notice closely, each model generates varying data transformations (or decision boundaries).

Now will you call this reproducible?

No, right?

It is important to remember that reproducibility is NEVER measured in terms of performance metrics.

Instead, reproducibility is ensured when all sources of randomization are reproducible.

It is because two models with the same architecture yet different randomization, can still perform equally well.

Different randomization may still lead to the same accuracy

But that does not make your experiment reproducible.

Instead, it is achieved when all sources of randomization are reproducible.

And that is why it is also recommended to set seeds for random generators

Once we do that, reproducibility will automatically follow.

But do you know that besides building a reproducible pipeline, there’s another important yet overlooked aspect, especially in data science projects?

It’s testing the pipeline.

One of the biggest hurdles data science teams face is transitioning their data-driven pipeline from Jupyter Notebooks to an executable, reproducible, error-free, and organized pipeline.

And this is not something data scientists are particularly fond of doing.

Yet, this is an immensely critical skill that many overlook.

To help you develop that critical skill, this is precisely what we are discussing in today’s member-only blog.

**Blog on testing a data science pipeline using Pytest.**

Testing is already a job that data scientists don’t look forward to with much interest.

Considering this, Pytest makes it extremely easy to write test suites, which in turn, immensely helps in developing reliable data science projects.

You will learn the following:

Why are automation frameworks important?
What is Pytest?
How it simplifies pipeline testing?
How to write and execute tests with Pytest?
How to customize Pytest’s test search?
How to create an organized testing suite using Pytest markers?
How to use fixtures to make your testing suite concise and reliable?
and more.

All in all, building test suites is one of the best skills you can develop to build large and reliable data science pipelines.

👉 Interested folks can read it here: Develop an Elegant Testing Framework For Python Using Pytest.

👉 If you liked this post, don’t forget to leave a like ❤️. It helps more people discover this newsletter on Substack and tells me that you appreciate reading these daily insights. The button is located towards the bottom of this email.

Thanks for reading!

Daily Dose of Data Science

Discussion about this post

Ready for more?