Distribution shifts

anonemaus · April 3, 2026, 7:31pm

The discussion about old models no longer working led me to thinking about a quick analysis of distribution shifts. There are libraries out there for this kind of thing, so maybe there are better ways of doing this. As you will see this is not AI assisted, other than in the autocomplete sense. What I have done is this:

take the classic feature data, medium
ignore the targets
created a semi-supervised type of label, era//52as a proxy for year
used a holdout and trained a gradient boosted model to predict year
this is a vector of probabilities for each row in the dataset
group by the known year
calculate the average probability vector for each group
generate a clustering thing
visualise

Here is the result:

Topic		Replies	Views
Era Splitting - Invariant Learning for Gradient Boosted Decision Trees Data Science	5	1834	October 3, 2023
What to do with "Out of Distribution" signal? Tournament	13	1579	November 2, 2021
Super Massive Data Release: Deep Dive Data Science	81	22374	November 22, 2021
How to label live data Tournament	4	880	March 1, 2022
Distance analysis using Facebook AI Similarity Search (faiss) Data Science	2	894	September 28, 2021

Distribution shifts

Related topics