Something strange with my validation results

Hello all,

I am testing a new model and in order to get the most rapid use from the fewest training sessions, I am training on eras [-300:] and validating on 100 eras into the past. Does this seem like a valid approach? The model has only trained to an MSE loss of 0.11 over a batch size of 300, so it is not very good, but things may be promising.

Upon plotting cumulative correlation for the eras [-400:-300], I see a distinct change in the prediction correlations around era -350, approximately era 757. My model has never seen this data though.

Did something change with the way features were calculated around this time? Was this normal market volatility? If anyone has any ideas, it will be great to read them.

There are several other features of interest in this plot, but this is the main thing of interest for me right now.

Thank you!!

1 Like

After a bit more training (final MSE = 0.08, batch size 300), here is what the curve looks like now. Perhaps if I train on a larger portion of the dataset, I can predict the two large drawdowns. Overall though, it seems like I can run this model for ~50 weeks. I hope it pans out.

Top plot is the same as above, bottom plot is Pearson’s Corr vs. Era into the past

1 Like

Okay. Well, I was correct in that things looked strange. I discovered some problems with the code I wrote. Fixed the issues, trained the model for 1 epoch on the entire validation dataset. Here are the update plots for scoring on the ‘training’ dataset, which I am confident has not been seen by the model. These are the last 200 eras, so 375-574. This behaves a lot better across time now.