I was looking for close data points between training, validation and live data. It didn’t work but I’m a bit surprised that the validation data isn’t closer to the training data.
Should be interesting to do an intra-era analysis and see if there are eras with more similar points and others without them.
Perhaps live era you selected belongs to a group of eras with high distance between points.
If I read the notebook correctly, you compared the whole training data to the whole validation data.
Would it not be more interesting to compare each era of the validation set to all eras of the training set (per era)?
If by that way you could determine the training era that is closest to the validation (or later live) era, you could time which train-era-model to use on the live data