@pschyska This was a mandate that we could not provide a v43->v5 map as we only want models trained on v5 data. Models trained on v4.x data, on average, will not have stable performance in the long run.
@stochastic_geometry_1 correct, V5 submissions will not receive scores right away. You can rely on validation / diagnostics to check the performance of a model.
I challenge the fact that validation performance is enough to gauge the live performance adequately. In my experience, validation results don’t correlate strongly with live performance, especially when considering 0.5Xcorr + 2Xmmc scoring: one of my models, p_tt_rg, has quite poor corr (0.02208/1.2603 live, by my calculation, 0.01916/0.92306 validation), but is in the 98,4th percentile for live score due to mmc. I would have never have selected that model to deploy, it was a happy accident because I wanted to test something with it. As you can’t optimize for mmc, you are essentially asking us to stake models with close to 0 information on how they will do on Sept 27. This sounds like a huge gamble for both parties.
But if it were true we could rely on validation, your claim that models trained on V4.x data can’t have stable performance doesn’t make sense. I showed you how one of my V4.3 models (the first one linked) goes from 0.0303/1.4699 to 0.0337/1.8104 on validation. If you have more specific information about that phenomenon, please share it. In my experiments so far, I have yet to see a V4.3 model that does worse on V5.0 validation. For example: did you consider models other than GBDTs? Maybe models using deep learning or not interpreting the features mainly numerically behave differently?
I just retrained and uploaded models… why not scoring them? As already discussed, I can also attest how different it is from diagnostics to live submissions.
Just upload your v5 predictions to the same slots that are already staked starting on the switchover day. (The last of the v43 staked rounds will still take another month to resolve after it stops accepting them, so you can’t just move those stakes immediately to other slots.)
Now also Numerai example scripts provided on Kaggle platform are retrained on v5.0 data and uploaded to the tournament (each profile has link to the Kaggle source code):
JOS_KAGGLE_MEDIUM_FN Profile - Numerai - tutorial #2 explaining feature neutralization, trained on medium feature set. Second worst performer with just 53% return. Interesting is that it is actually quite difficult to achieve better metrics with feature neutralization on v5.0. Anyone have an explanation?
JOS_KAGGLE_SUNSHINE Profile - Numerai - older example from github (now not available) featuring both ensembling and neutralization on 1/4th downsampled “all data” with medium feature set - second best performer with above average 77% return.