I have just realized that since the introduction of TC my desire to experiment and try new things out is slowing down.
I have still many ideas, e.g. training on multiple targets, but knowing that I cannot compute TC for my models makes me carefully ponder every new development because I cannot access its quality (being good on corr it’s not enough for me anymore). When I want to evaluate a new model I need to create a tournament test entry and wait few months before being able to get a sense of the TC performance. Not only that is boring and slow, it is also wrong to access the TC on few recent entries only.
In the long run, I believe numerai will move away from corr and will focus only on a metric that make sense for their portfolio. That metric could be TC or something else, but they need to allow users to evaluate that metric during the R&D phase. They are probably already thinking at a solution, otherwise the research of new ideas will be negatively impacted.
Why am I writing this? Just for fun and because I am curious to hear what other users think.
It is not slowing me down from burning holes in my cpus running new things all the time as usual, but it is slowing me down from actually staking on them yeah.
It slows me down because it takes 2wks to see the first live TC and much longer needed to tell confidence, due to the lack of TC for late subs and lack of diagnostic tools.
I’ve mentioned in RocketChat: the only ones who can research methodically for TC (with good confidence) are the ones who already have models with good TC because they have long history of backfill TCs and just need to improve on those models. For the majority who are not so lucky including myself, it’s pretty much a shot in the dark and there’s no way to tell if a new model is good on TC until it has a (very) long history. So I only stake 0.5-1xTC on very few new models without much confidence.
TC has actually spurred me to try many more crazy experiments. I was only using 1 model slot before TC, now I am at 34.
Instead of grinding huge ensembles trying to get more bits of CORR, I’m now rapidly rolling out the most diverse set of models that I can think of in the hopes of getting slices of the TC pie. It’s like panning for gold I suppose.
But yeah, I definitely agree with you that chasing recent TC performance is a big problem.
its basically like finding needles in a haystack, but before with corr you could use a metal detector (metrics for validation) to find them more quickly. Basically either we need that metal detector again or we need to increase the number of needles, so give us more slots pls ;~)
No slow down. Just ignore and don’t stake on it.
100 %, I am at least trying to optimize the metrics that are suppose to be correlated with good TC, but
it is only getting worse. My best model at TC is the stupidest one that is trained on 1/5 of the training dataset.
Totally agree. After TC I have a lot of ideas parked. If I haven’t any way to check myself if a model is better or no I have no incentive to research.
In my opinion, TC has some problems that is need to change, so I don’t want to study it at the moment too.
I second @qeintelligence , it is more the running out of model slots that is slowing me down - ideally, I want to have at least 500 slots - and probably run 5-10 variant on each base model that I can come up with
Your position on leaderboard shows that you are doing it right!
So is your signature model ansambl or best single idea?
Bought predictions from numerbay, the original model is this one
I have a model with just random predictions that I use as a kind of base line. And it keeps gaining points on the TC rank, achieving now rank 239, better than any of my other models. You can check it there and even buy it there.
Thus I’m not totally confortable either with this TC stuff !
For sure my research have changed of direction, from big ensembles to more experimental unique models. And it’s true it’s not easy to find a compass to follow.
But isn’t Numerai “the hardest data science competition in the world” ?
It’s tough, it’s true! But…please don’t sell random predictions. We don’t have to share secrets if we’ve got any (which we probably don’t because TC is the way it is). But let’s not exploit each other and take money for nonsense, even nonsense that does randomly well at times.
A random model might have some merits!
Check it out! Corr and FNC are ~0. TC is stricktly positive and it’s beyond statistical significance.
The random noise can improve the metamodel, when the “trained” models do poorly. Which they often do.
Yes I think the reason for that is the systematic error all models inhibit when they train on the limited data we have. If you upload random noise, you are guaranteed to be decoupled from the systematic error from the training data. However, you are then also decoupled from being correlated to the signal, especially in times when all other models are performing well.
While not TC but CORR20, you can see an example here from the models page overview from all of my models. For some reason the daily round 360 has an awful start where all models are tanking significantly in corr, but my
numpy.random.rand() predictions are obviously unfazed by that (light blue curve at the top of round 360)
Do you dare to stake on it?
Remember that inverting predictions results in the exact opposite tc. Random predictions could have just as easily been the opposite random predictions.
Inverting predictions results in opposite correlation but both can have positive TC in case most of models performs badly.
I haven’t done the experiment myself, but flipping predictions should result in the same tc with opposite sign, just as @murkyautomata said, independent of others. I guess if others are really bad the random prediction should get a value close to zero, i.e. 0.01, so flipping it would result in -0.01, which is then almost as good and still better than -0.30 tc