Overfitting to Validation Data

I was thinking about something that could increase test performance a little and was wondering if any of you do this:

  • Train models on training set
  • Compare models on validation set
  • Pick best model
  • Retrain model on training + validation set for extra performance

Does anyone do this? Why or why not? I don’t because it confuses me when model validation scores come back so high on submission receipts. But this could possibly improve test performance. Let me know thoughts and opinions.

1 Like

When you submit, Numerai report validation scores on the same validation set that you are given. So when you train on validation, just ignore the validation scores because they are not meaningful.

As for if people train with the validation data, some people do it and some people don’t!


I’m against picking best model using validation set (If I got your idea right): it just a small piece of info compared to performance analysis which you can get during training. It’s similar to Kaggle’s situation when people choose model by Public Leaderboard score instead of local CV and here is just one example what usually happens in that case: medium article (and much more can be googled using “shake up kaggle”). It was the thing which I’ve learned in the hard way during my first Kaggle competition :slight_smile:


Hi nenco,
This is actually what I do and I believe is a perfectly reasonable approach which I personally follow (including for these models (Sirbradflies, Flabridrises, Fbaldisserri).

Regarding the step 4, retraining on training + validation, I don’t have a clear answer yet. Currently I am using 8 models slots with:

  • 4 for models trained only on training data
  • 4 with exactly the same models but trained on training and validation data

I will post the comparison between these 2 groups when I have an history of at least 20 rounds to compare.



I am very interested in this comparison. Can you send profile links of the 4 models trained only on training, and the same models trained on training and validation? Which one has higher performance so far?

1 Like

Could you share the models?

I wanted to wait until all models have at least 20 weeks of history but sure, no problem.

Here’s all model pairs (training / training+validation):

  • sirbradflies / bradfliessir
  • firlasersbid / lasersbidfir
  • fbaldisserri / baldisserrif
  • flabridrises / ridrisesflab

Don’t make fun of my model names and please share any analysis you may do, thanks!