Train/validation dates

quantized · May 18, 2021, 1:38pm

I’m new to Signals and am developing a model. I notice the train data is all quite old, up to about 2012, with validation being data since 2012. Is there a reason for this? Ideally I’d like to train on more recent data due to the type of signal I’m developing. Maybe I’m missing something here?

minou · May 19, 2021, 12:33pm

Unlike the main competition, the data in the historical file isn’t essential to use. The target you submit (a continuous value centered around 0.5) is based on the expected return between day 2 and day 6 after the Friday date, and you can use any data you like to come up with a signal. If you plan to derive from traditional OHLC/OHLCV market data, you might use a data source such as y-finance, and train/validate over any segments of the data you wish. Submitting values derived directly from some data without involving a model could also be effective, e.g. if you had a source of short term sentiment data, simply centering and scaling that might suffice.

quantized · May 19, 2021, 12:59pm

Thanks for that. I guess if I don’t use the published validation data I won’t get any metrics from Numerai, but that doesn’t matter if I’ve done my own train/test splitting from data I’ve got from yfinance or elsewhere. Or would I get some metrics?

minou · May 19, 2021, 1:12pm

There are only diagnostics if submitting signals based on validation data. Depending on preference, that info could be useful to have as a comparison of models or merely an unnecessary complication. Unlike the main competition, validation takes a while to be produced for signals (up to 15 mins mentioned in the docs IIRC), so that might sway against using it. Purely personal preference though.

Topic		Replies	Views
Creating new targets for Signals Signals	4	1154	June 20, 2021
Updated Signals Validation Data Announcements	0	1360	February 27, 2021
Get your (unofficial) Signals target22 here to start building your models Signals	1	1051	June 14, 2021
Leverage numerai signal solution in real trading Signals	14	2165	August 19, 2021
Relevance of historical training_data to new market regime Data Science	5	1288	October 11, 2020

Train/validation dates

Related topics