Continuing the conversation
I also see lower correlation on val2, but at the same time, I rely more on the experience and on how much I trust my model/methodology.
It is unclear re leakage from era to era, there s a few ways there could be leakage. 4 weekly targets and weekly eras are definitely one way there is leakage. Also if some features are some kind of weighted historical average (ema or others), then there is also leakage here. I mostly assume no leakage myself, despite knowing there is some. If there is a lot of it, then I am kind of screwed, but it’s hard to fully comprehend
eras are ordered, so you’re definitely training on data that happens after your validation and test data. But who knows how bad that is since they’re not giving us the necessary information to actually treat things as a timeseries.
So validating a model that was trained on data from the future? Is that an issue though? If I’m training on data from the future I can see how that’s a problem. I don’t think that’s happening here though. Maybe I’m wrong?
MikeP what about synthetic data to remove the danger of training on future data, but increase the ability to validate? Do you think that’s possible here with such low s/n?
my feeling has always been that if you know enough about the underlying structure that you could generate useful synthetic data, then you know enough to just make a good model directly. Would be weird to generate synthetic data with this known underlying structure and then hope that your model can learn that underlying structure (which you already clearly know since you made the synthetic data). I know MLDP likes this area though so I must be missing something myself