Is there a live model to track the example model of the new testset?
Over on the chat, someone said integration_test_7 switched to new (“super massive”) data on round 282, and then… switched back? Weird, if true.
Perhaps you could compare performance of “integration_test_N” (for every “N”) against performance of example_predictions.csv (legacy version, and super-massive version). Just use 2 of your 50 model slots, to submit those (unstaked) from your account.
Or instead of example_predictions.csv, you could run the example code that’s supposed to produce it.
Of course, if you were doing that, you would no longer need integration_test_whatever.