Distance analysis using Facebook AI Similarity Search (faiss)

jmnum · September 26, 2021, 11:26am

I was looking for close data points between training, validation and live data. It didn’t work but I’m a bit surprised that the validation data isn’t closer to the training data.

It may not be of any help, but here is a colab notebook: Colab : Distance analysis

eleven_sigma · September 27, 2021, 7:51pm

Should be interesting to do an intra-era analysis and see if there are eras with more similar points and others without them.
Perhaps live era you selected belongs to a group of eras with high distance between points.

bigbertha · September 28, 2021, 6:27am

If I read the notebook correctly, you compared the whole training data to the whole validation data.
Would it not be more interesting to compare each era of the validation set to all eras of the training set (per era)?
If by that way you could determine the training era that is closest to the validation (or later live) era, you could time which train-era-model to use on the live data