How are you guys doing with your new models trained from the super massive dataset?
or are you diamond handing your good old legacy models?
Now that the new dataset had been around for almost 4 months, and some of the new models have up to 10 resolved rounds under their belt, I thought it would be interesting to make some overall comparison.
Here is the comparison view of all of my legacy models v.s. new models - the first two are corr, and the last two are corr percentile
To be clear: I think it is far too early to draw conclusions on any model’s performance until they have more than 20 resolved rounds but still find this interesting
what stands out for me for now is model performance divergency - my legacy models do tend to go up and down together, some are more stable than others, but they more or less bundle together. the new models however seem to be behaving quite differently in this aspect. For instance, for round289 my new models have more or less covered the whole spectrum, have not seen model performance spread quite so widely from my legacy bunch…
I am more or less using the same data pre-processing steps, similar algorithms, and not quite different validation setup. My guess is that the wider choice of features, and the newly available alternative targets are contributing quite heavily to this divergence.
I’m quite happy with the new data and have been phasing out my legacy models; the last of those I submitted 3 weeks or so ago.
Part of that is due to the size of the new data forcing me to rethink my approach, so with round 281 I took advantage of the extension to 50 models from 30 and introduced a somewhat different algorithm as well. The cumulative results are shown below:
All but one are grey, that’s just because of the way Numerai has the colour tables set up. I have 2 parameters that govern this model, one that can take one of 4 discreet integer values, the other can take 5, which results in the clustering of the tracks. These are GammaRat 31 through 50.
Around round 285 (iirc, I don’t really keep track) I took the model above and introduced a new parameter, and replaced GammaRat 11 to 30 with that. (Those had previously used the legacy model). About half have really improved, and the rest not. But that’s ok, because it’s giving me a decent view of how these parameters interact. So I’m using that info to once again redesign the underlying algorithm. Fun and games, to be sure
GammaRat 1 to 10, my last legacy models, got dropped three weeks ago and replaced with a similar algorithm to the ones above. They don’t look great, but right now it’s too early to say for sure.
nice, interesting to see how this developed.
I haven’t fully replicated all my modelling methods on the new dataset yet - with the newer data update coming, probably I will do more after December.
Nevertheless, after a few tough rounds, most of the legacy models seem to have recovered - some of them never suffered in the first place - so I am just happy seeing them running. I would definitely keep most of my new models at least for 20 rounds to see how they play out longer terms
I think a possibly underrated subject of the new data is all the new targets. I think they’ve been maybe more helpful to me than the new data itself.
My experience is that my legacy workflow has beat a similar workflow on the new data. I don’t have all the comparison data together to show, but yesterday my legacy model returned 4.8% and was 95 percentile on corr and mmc. My co-modeler burned -0.4% and was 40 percentile on corr and mmc. I’m not excited about being forced down the new data route.
Yes that is one of my observations at the moment, from my point of view seems some of the new targets are more volatile than Nomi
Not sure how much they help though, probably need a longer runway to see. For now my legacy models are out performing the new ones simply by being more stable
I have not tried only using those. I didn’t realize that there was a list of features that were “closely related” to legacy features. I remember the question being asked about which features are the old features and the answer being that none are the same because of the timing differences. Is that list published or maybe I just missed it in the original announcement?
Needs fact checking but from memory the tournament 3 month avg was around 15-20% when the new data came out and is now at 7.7%. I was around 25% when it came out and now I’m at 49%. I’m assuming that the meta model is currently dominated by models on the new data set, while I have been playing it safe and sticking with my legacy model while I see how the new data performs.
Green is staked legacy, Orange is unstaked legacy experiments, Cyan is unstaked super massive.
from the team’s October Updates
under New Feature Metadata - the “legacy” set
Legacy: 304 of the original 310 features that were carried over to the new dataset. You can use this set to achieve nearly the same model as the legacy data.