Train/validation dates

quantized · May 18, 2021, 1:38pm

I’m new to Signals and am developing a model. I notice the train data is all quite old, up to about 2012, with validation being data since 2012. Is there a reason for this? Ideally I’d like to train on more recent data due to the type of signal I’m developing. Maybe I’m missing something here?

minou · May 19, 2021, 12:33pm

Unlike the main competition, the data in the historical file isn’t essential to use. The target you submit (a continuous value centered around 0.5) is based on the expected return between day 2 and day 6 after the Friday date, and you can use any data you like to come up with a signal. If you plan to derive from traditional OHLC/OHLCV market data, you might use a data source such as y-finance, and train/validate over any segments of the data you wish. Submitting values derived directly from some data without involving a model could also be effective, e.g. if you had a source of short term sentiment data, simply centering and scaling that might suffice.

quantized · May 19, 2021, 12:59pm

Thanks for that. I guess if I don’t use the published validation data I won’t get any metrics from Numerai, but that doesn’t matter if I’ve done my own train/test splitting from data I’ve got from yfinance or elsewhere. Or would I get some metrics?

minou · May 19, 2021, 1:12pm

There are only diagnostics if submitting signals based on validation data. Depending on preference, that info could be useful to have as a comparison of models or merely an unnecessary complication. Unlike the main competition, validation takes a while to be produced for signals (up to 15 mins mentioned in the docs IIRC), so that might sway against using it. Purely personal preference though.

Topic		Replies	Views
Updated Signals Validation Data Announcements	0	1377	February 27, 2021
Overfitting to Validation Data Data Science	13	1832	July 8, 2021
Relevance of historical training_data to new market regime Data Science	5	1334	October 11, 2020
Open Source Datasets for Numerai Signals Signals	5	1694	March 8, 2022
How often is the training and validation data updated? Tournament	3	1318	March 1, 2021

Train/validation dates

Related topics