How to Interpret Reconstruction Loss While Detecting Multivariate Covariate Shift?

Solving critical challenges of real-world model deployment.

Oct 26, 2023

Today’s post is the last one in our three-part multivariate covariate shift detection series.

Read the first two posts here if you missed them:

Yesterday, we discussed an interesting technique to detect multivariate covariate shift in ML models.

To reiterate, the idea was based on data reconstruction, where in, we learn a mapping that projects the training data to low dimensions and then reconstructs the original data from the low dimensions.

Data reconstruction aims to construct the original data after projection

Autoencoders can be used here:

Using Autoencoders for data reconstruction

They are neural networks that learn to encode data in a lower-dimensional space, and then decode the original data back.

How to use this model to detect covariate shift?

Consider we have deployed our model, and it is generating predictions on new data.

First, we can train an Autoencoder (or other suitable data reconstruction models) on the training data.

This way, the Autoencoder will capture the essence of our training data and its distribution.

Next, as new data is coming in and the model is making predictions, we can periodically determine the change in data distribution by finding the reconstruction loss on this new data using the Autoencoder model:

If the reconstruction loss is high, it indicates that the distribution has changed.
If the reconstruction loss is low, it indicates that the distribution is almost the same.

But the question is “How do we interpret reconstruction loss?”

For instance, if the reconstruction loss is 0.4 (say), how do we determine whether this is significant?

We need more context.

Let’s understand how we can solve this problem.

Background

Contrary to what many think, covariate shift happens gradually in ML models.

In other words, it is unlikely that our model is performing pretty well one day, and the next day, it starts underperforming DUE TO COVARIATE SHIFT.

Of course, abrupt degradation is possible but it is rarely due to covariate shift. Most of the time, it happens due to other issues like:
May be due to some compliance issue, the team has stopped collecting a feature. As a result, your model no longer has access to a feature you trained it with.
Or may be there were some updates to the inference logic which was no thoroughly tested.
and more.

But performance degradation happens gradually due to covariate shift.

And in that phase of performance degradation, we must decide to update the model at some point to adjust for distributional changes.

While data reconstruction loss is helpful in deciding model updates, it conveys very little meaning alone.

Simply put, we need a baseline reconstruction loss value to compare our future reconstruction losses.

So here’s what we can do.

Let’s say we decided to review our model every week.

After training the model on the gathered data:

We determine our baseline reconstruction score using the new data gathered over the first week in the post-training phase.
We use this baseline score to compare future losses of every subsequent week.

A high difference between reconstruction loss and the baseline score will indicate a covariate shift.

Simple and intuitive, right?

The above reconstruction metric, coupled with the model performance, becomes a reliable way to determine if covariate shift has stepped in.

Of course, as discussed yesterday, true labels might not be immediately available at times.

Thus, determining performance in production can be difficult.

To counter this, all ML teams consistently try to gather feedback from the end user to learn if the model did well or not.

This is especially seen in recommendation systems:

Was this advertisement relevant to you?
Was this new video relevant to you?
And more.

These things are a part of the model logging phase, where we track model performance in production.

This helps us decide whether we should update our model or not.

However, model updates are not done like: “Train a new model today and delete the previous one.”

Never!

Instead, all deployments are driven by proper version controlling.

I have written multiple deep dives on deployment version controlling, which you can read next to learn this critical real-world ML development skill:

Over to you: What are some other ways to detect covariate shift in ML models? Let me know :)

👉 If you liked this post, don’t forget to leave a like ❤️. It helps more people discover this newsletter on Substack and tells me that you appreciate reading these daily insights.

Thanks so much for appreciating the effort :)

The button is located towards the bottom of this email.

Thanks for reading!

Daily Dose of Data Science

Discussion about this post

Ready for more?