While training one of my models by expanding time series cross validation, a random thought popped into my head:
Lets say I train a model on the first 100 eras, and test/select it during training on eras 110-120 to prevent any leakage. In the next fold I will train on the first 120 eras, test on eras 130-140, and so on. The furthest that I can do this fold is right until
live era - 6, meaning right now I would train on the first 1010 eras and test on 1020-1030.
However, if I use the model from the last fold to do the live predictions, the time seperation from the train set is not 10 eras but 30 + 6 eras. So the performance numbers from the test sets have a different seperation window than what I need for the live data. But obviously if I increase the time seperation on the test data during training, this will only add up to the seperation from the live data.
How can I go around this? Is this even an issue? Until now I always used the model from the last fold.
The data in the training set contains overlapping rounds!!!
When you train on the first 100 eras, you shouldn’t use eras 101-104 for testing, because in real life they are not available.
You should used purged time series cross validation.
You should see CV and Training as two separate steps. You may want to use CV (like through cross_val_score in sklearn) to get a sense of the model future performance (if you don’t use the CV data for model tuning) but then you can train the model on the full dataset so you don’t “waste” any data.
Hope it helps!
Yes, that is why I said I train on first 100 eras, and will test on eras 110 - 120, so a 10 era gap between train and test to avoid leakage.
I guess my problem boils down to this misunderstanding, will read again on CV
You don’t have to use the last model of the fold. You can just train a new model on the latest data (obviously without a test set) which would match your CV expanding window strategy. One thing I have found helpful is to not just look at a short test window (in your example 10 eras) but rather all the available data as you walk forward. It gives you a better since for how the model performs as market regimes change over time. Expanding does appear to be better performing than fixed window. Here’s an example of a simple model I was testing where you see the mean and sharpe for the entire test data set but also the first 20 and last 20 as you walk forward: