V5 "Atlas" Data Release

@pschyska This was a mandate that we could not provide a v43->v5 map as we only want models trained on v5 data. Models trained on v4.x data, on average, will not have stable performance in the long run.

@stochastic_geometry_1 correct, V5 submissions will not receive scores right away. You can rely on validation / diagnostics to check the performance of a model.

1 Like

I challenge the fact that validation performance is enough to gauge the live performance adequately. In my experience, validation results don’t correlate strongly with live performance, especially when considering 0.5Xcorr + 2Xmmc scoring: one of my models, p_tt_rg, has quite poor corr (0.02208/1.2603 live, by my calculation, 0.01916/0.92306 validation), but is in the 98,4th percentile for live score due to mmc. I would have never have selected that model to deploy, it was a happy accident because I wanted to test something with it. As you can’t optimize for mmc, you are essentially asking us to stake models with close to 0 information on how they will do on Sept 27. This sounds like a huge gamble for both parties.

But if it were true we could rely on validation, your claim that models trained on V4.x data can’t have stable performance doesn’t make sense. I showed you how one of my V4.3 models (the first one linked) goes from 0.0303/1.4699 to 0.0337/1.8104 on validation. If you have more specific information about that phenomenon, please share it. In my experiments so far, I have yet to see a V4.3 model that does worse on V5.0 validation. For example: did you consider models other than GBDTs? Maybe models using deep learning or not interpreting the features mainly numerically behave differently?

2 Likes

I just retrained and uploaded models… why not scoring them? As already discussed, I can also attest how different it is from diagnostics to live submissions.

so when you said no score between 9/13 - 9/27, that’s for every submission including the v4 one and it is literally no score and no payout???

No, v4 will continue until switchover day. But there is no overlap of scoring both.

1 Like

Thanks, that’s fair.

A week of scoring would be useful.

Do I have to drain my v4.3 staked models or will the v4.3 staked amounts become available on Sep 27 for staking v5 models?

Always happy to see Numerai evolve! Any update on meta_model.parquet for V5?

Just upload your v5 predictions to the same slots that are already staked starting on the switchover day. (The last of the v43 staked rounds will still take another month to resolve after it stops accepting them, so you can’t just move those stakes immediately to other slots.)

v5 meta_model.parquet should be available on September 27

1 Like

Is there a chance to get v43_to_v5_map? I missed the opportunity to get that mapping when it was available.

Now also Numerai example scripts provided on Kaggle platform are retrained on v5.0 data and uploaded to the tournament (each profile has link to the Kaggle source code):

So let’s see how they will work in “Atlas” era. :crossed_fingers:

3 Likes

Hello, is it still coming? I selected my model with the help of that data in the past and if it’s no longer provided, I’ll need to change my strategy.