After the last fireside chat there have been lots of comments on Discord and some of them were very interesting. Since I like to hear the voice of the community (and Numerai team for that matter) I am opening this thread so that people can share their thoughts here.
Compare to discord, I believe that a forum post encourage a more thought-out exposition of ideas and also it is nice to collect all related thoughts on a single place.
Note: I believe all of us like Numerai, even though for different reasons, so please do not abuse this post for rants, but just for constructive ideas.
I donât mind sudden changes in the payout scheme, so the last announcement on this regard didnât bother me at all. I actually love the new emphasis on CORR. What bothers me is the drastic decrease in payout: 1xCORR + 3xTC doesnât equal 2xCORR + 1xTC. However they say it is a temporary solution, so I might be ok with that if âtemporaryâ means some weeks.
What really worries me is the bad performance of the fund and Numerai uses of some model performance as scapegoat. It seems as if Numerai hasnât figured out yet a way to transform the fundâs needs into a proper payout scheme and this makes the hedge fund suffers. There should be no need to explain what a model should do or not do, but everything should be the consequence of a smart payout scheme that encourages and rewards the models which are useful to the fund.
I would like to better explain what I meant in my previous post.
The hedge fund has recently suffered serious losses, then my question is: was there a combination of model submissions that performed well in the same period that the hedge fund performed bad?
A) If that is not the case, then we have a problem: the community or the data sources are not good enough to provide what the hedge fund needs.
B) If there was indeed a combination of model submissions that performed better than the hedge fund, then the question becomes: Why Numeraiâs team hasnât figure out yet how to properly select the predictions they need? If the problem is so hard they could transform it in a new tournament.
I have stated before my doubts about the emphasis on some artificial âtrue contributionâ, no matter how clever might be the constructs used to justify it. Anything that is not based on the actual predictions (correlation performance) is going to be sub-optimal.
While it is true that any model tends to learn more from the outliers, it is rarely in the right direction.
I see a problem with the incentive system allowing compensation for inferior model performance with more stake.
Stake is just a signal. Thereâs no point to allow 10 times more weight than average if thereâs no reason to believe that your model is 10 times better than average.
Donât know how to fix this except by enforcing stake limits based on historical performance.
The theory is that more stake = more risk = more confidence = it must be better?
It sounds stupid when you put it like that, but it isnât a totally bankrupt theory. Assuming historical performance of a model (or of the modeller) canât be known (by the fund at least) â which was always the dream, to be able to use signals that any person came and added to the mix â what else can be done? Everything else involves some sort of gatekeeping system of proving yourself, etc â i.e. you move towards normal hedge fund operation, not really crowd-sourcing (basically youâre hiring people then, but with a more open audition system).
Something that can be done within current system is enforcing limits to weight in the metamodel, but NOT based on performance, just enforced period â nobody should have too much weight no matter what. (More accurately, redundant signals shouldnât have too much aggregate weight as you can get around limits on people or model slots.) Surprisingly, they have always blown off questions about this when it seems so obviously a potential problem.
But still, I keep pointing this out. YOU ARE GOING TO HAVE DRAWDOWNS NO MATTER WHAT. Not every drawdown is an emergency situation calling for a sudden rug pull like weâve just had. (Even if the change is ultimately good, implementing it as a rug pull where all your work is trashed without compensation is not. You destroy good will, etc etc) What I see going on now is panic taking over â âgetting caught in the switchesâ. I would bet money that recently at Numerai theyâve been asking âwellâŚwhat thing that if we had been doing that instead of what we were doingâŚwould have gotten us through this last drawdown period doing good/ok instead of bad?â It sounds reasonable on the face of it â letâs just tweak things so that we would have done good in this recent past period if we had done this tweak soonerâŚbut it is the road to ruin, ask any gambler.
(note: this is account leaderboard data, last weekâs data, so numbers may differ a bit if you want to check now)
I take stake x TC as a measure of total influence on fund performance.
The density plots look similar to the above for accounts staked up to 8k.
Above 8k, the plots diverge you get results depending on which of the few big accounts get included.
To me, sparsity of the point cloud looks like the real issue here.
When expanding into sparse territory the metamodel might actually pick up more noise than signal.
Another reason to limit stakes and have participants not move too far away from the crowd.
Also note that the range of (Stake x TC) is nearly constant between 2k and 8k and how it explodes after that. That means increased dependence on fewer models. Not really desirable.
So this is stake-weighted TC numbers and (current) stake as shown on leaderboard? Not sure that really will capture what has gone on round-to-round over time. Interesting nonetheless.
Agree, the stake signal becomes less useful when there are big differences in the amount staked by accounts. Just as is seen in the Signals tournament.
How about a within-account stake weighted MetaModel. This would give a more precise signal of the beliefs of the individual model. If an account distributes its overall staked amount 90/10 on two models, it clearly indicates his beliefs between those two models.
At the same time, the account weighting could be based on the ranking from the account-leaderboad. This would give higher weights to accounts that have proved their worth.
E.g., with some arbitrary numbers:
An account is ranked in the top 20 of the Account-Leaderboard and thereby has an overall weighting of 0.02 on his account. And he has two models (Model A and Model B), which he has distributed his total stakes to 90/10 on A and B. Then the MM weights from each model would be:
Model A = 0.02 * 0.9 = 0.18
Model B = 0.02 * 0.1 = 0.002
This system might be more robust to large stakers and emphasize accounts that have already proven their worth.
With datasets (and even metrics) changing all the time, what good is proving your worth? I mean, what have you proven? (Putting aside the whole fooled by randomness issue with any such scheme based on track record, which is a huge issue.) Is it assumed if you are good at one thing you are good at everything that comes along?
They used to actually not accept predictions that didnât have a threshold minimum correlation with the examples â they just said âthose tend to not be good modelsâ.
Well, changes to metrics would also alter the leaderboard and thereby automatically change the account weighting to those metrics that the fund believes are most useful.
And the account leaderboard, in my opinion, gives a good indication of how well individual people have performed during the past year, which definitely has been in a changing environment. So they might also be good at handling new changes to eg. the dataset. And if they are not, they will decrease in account weight by falling in the account ranking, just as the case is now by burning their model stakes.
So this would also be a system that âoptimizeâ itself, just as the one we have now with the stake weighting.
Yeah, but you wouldnât have been optimizing for those â they might not even have existed before (probably didnât). So then we get rug pulled because we didnât do well on a metric in the past that didnât even exist in that past. This has got to work for the participants somehow as wellâŚ
Think we are talking about two different things. Iâm talking about the MM weights and how to maybe optimize the MetaModel.
I donât see how changing the MM weight system will rug-pull anyone since they donât affect payouts (or at least to a minimal amount). The only thing that might be affected is the MM performance, and the fund should obviously only do this if it could improve MM performance
Well, they are talking about restricting staking, i.e. keeping the link between staking and MM control, but controlling who gets to stake and how much. So you do bad, and you donât get to stake seems to be the likely result of any gatekeeping scheme. In any case, I think they will always want to keep the link between MM control and potential rewards â if theyâve downgraded you in the MM weights due to some rule change, you can bet your payoffs are going with it. Which is fine, but not overnight.
Hereâs the larger thought about all of this thatâs happening. Numerai has a plan. Anybody finds it easy to follow a plan when things are going good. If the first thing you do when something goes badly is change the planâŚTHEN YOU NEVER HAD A PLAN. The whole point of a plan is so you know what to do when things go wrong â you stick to the plan. Thatâs the most important reason for having a plan. If you make changes, you do them holistically and because you realize the plan is flawed â something going badly in a probabilistic game like this isnât in and of itself evidence of a flawed plan. Again, it is the entire reason for the plan because things are ABSOLUTELY GUARANTEED to go badly sometimes â the plan is what gets you through.
And if you find that you have a flawed plan and there are necessary changes to be madeâŚwell first of all you donât blame the people who were just following the plan you set-up. You apologize, you say weâve made a mistake, we recognize that this plan will never work (or this part of it, whatever), and its got to change. But its all our fault, weâll try to make good any damage weâre doing, and weâll start from there and think it over very carefully. Thatâs not what I see happening.
This is true for most individuals, but not for big investors/institutions.
Big investors stake big, because they have a lot of money to allocate.
Their stake is not proportionate to the quality of their model.
It canât be! There are some very smart inidiviuals here, submitting great models.
Bigger investors/instituitions canât be 10x better, but they can have 100x more money to allocate.
Also worth noting the potential effect of un-staking some big accounts for bad performance.
If they sell their stake, NMR could collapse and then it takes the whole ecosystem with them!
Unstaking big accounts or many small bad bad performing accounts means the end of the game.
NMR is weak anyway. All cryptos are. Such an event would be the final one.
Yes, I agree. The biggest stakers often just want a returnâŚmaybe any return because they are really just NMR hodlrs or whatever. But even if they are top-class models, as you say they still canât be exponentially better than everybody else, so I think some sort of anti-top-heaviness clamp on staking (or something equivalent) would be appropriate. The stakes tend to conform to a power law distribution but the quality of the models donât. Its the distribution of the stakes/mm control that needs to be adjusted to be more like the other.