Relevance of historical training_data to new market regime

Maybe a naive question here :slight_smile: It appears that for each round the training_data is identical, and how can the model to be trained in order to cope with the nouvel pattern/regime of the market as the historical data is deemed to lose gradually its relevance and to make the validation / backtest less effective, for instance, the covid-19 scenario (it’s likely that the training data set does not cover this era).
What’s the reason that numerai does not update the training set regularly ?


I was think about the same thing. It would make sense to have all past eras before current round for train/validation. What’s the reason not to expand train/validation data each round?

This would allow to make better regime change models and perform rolling (aka sliding) cross-validation and model training.

Tournament data is actually updated regularly, but as test eras, i.e. without the targets provided. If targets were provided, Numerai would not be able to use the test eras for out-of-sample validation of the submitted predictions.

Well, fine, you can leave the last N eras for oos validation and still increasing training data in an expanding way, right?

Numerai founder and CEO Richard Craib stated that the plan is to release updated validation data every 6 months.

Source: Fireside Chat 2020 Q4


Richard said recently that they will probably be releasing new validation data every 6 months, so that’s something.