Here's an explanation you always wanted with an intuitive analogy.
In a time series context, where we learn from the past to predict the future, we may have to make do with a single test set consisting of the most recent x% of data. I wouldn't want to throw away a chunk of data every time I evaluate a model. But we would want to evaluate on the test set only when absolutely necessary; for example, to validate the final model before deployment.
So when you merge the test set with training and validation sets then create a entirely new split (yielding a new test set) I can then use this test set the same way even though the model was technically exposed to the entire sets (training, Val, test)?