Is there a timeline estimation for when the legacy dataset will be fully deprecated and unsubmittable?
i hope no. not everyone is ready to move to mass data set.
up ! I’d like to know also…
They have said it would be many months at least, and you’ll get plenty of notice. There is no scheduled date (that has been made public anyway). I’m assuming they will wait until legacy data submissions have tapered off quite a bit…
and maybe the relative performance of legacy data submissions
Do they have enough info to distinguish legacy uploads from new data uploads? I don’t think they do… If I’m right maybe we should have users tag their model bio with #legacy so they can see which models are still using legacy data. (I’ve gone ahead and done that for numer.ai/dev0n and numer.ai/dev1n)
It would be a shame if they turned off legacy and were surprised by how much value the legacy models are still providing to the metamodel.
in my view, the 6 millions dollar questions is whether new models are better than legacy models. if models trained on new data really do prove to be consistently outperforming models trained on legacy data, then for sure they can retire the legacy dataset and we should all be happy about it.
However, my baseline assumption is that at least you need 1 year worth of live performance - i.e. 3 times the 20 round window - to decide that. So I think it is reasonable to at least keep the legacy dataset for 1 year. beyond that, it should be depend on new/legacy model’s relative performance, and meta model performance with/without legacy models.
I agree that the relative performance needs to be evaluated over a long period. Here is a plot of the last 6 months of resolved rounds. The two models are different, but had similar performance until one model switched over to the super massive dataset. I would be interested if others are seeing different results, but this doesn’t look promising so far.