Numerai Fireside Chat Aftermath

After the last fireside chat there have been lots of comments on Discord and some of them were very interesting. Since I like to hear the voice of the community (and Numerai team for that matter) I am opening this thread so that people can share their thoughts here.

Compare to discord, I believe that a forum post encourage a more thought-out exposition of ideas and also it is nice to collect all related thoughts on a single place.

Note: I believe all of us like Numerai, even though for different reasons, so please do not abuse this post for rants, but just for constructive ideas.

These are my thoughts.

I don’t mind sudden changes in the payout scheme, so the last announcement on this regard didn’t bother me at all. I actually love the new emphasis on CORR. What bothers me is the drastic decrease in payout: 1xCORR + 3xTC doesn’t equal 2xCORR + 1xTC. However they say it is a temporary solution, so I might be ok with that if “temporary” means some weeks.

What really worries me is the bad performance of the fund and Numerai uses of some model performance as scapegoat. It seems as if Numerai hasn’t figured out yet a way to transform the fund’s needs into a proper payout scheme and this makes the hedge fund suffers. There should be no need to explain what a model should do or not do, but everything should be the consequence of a smart payout scheme that encourages and rewards the models which are useful to the fund.


I would like to better explain what I meant in my previous post.

The hedge fund has recently suffered serious losses, then my question is: was there a combination of model submissions that performed well in the same period that the hedge fund performed bad?

A) If that is not the case, then we have a problem: the community or the data sources are not good enough to provide what the hedge fund needs.

B) If there was indeed a combination of model submissions that performed better than the hedge fund, then the question becomes: Why Numerai’s team hasn’t figure out yet how to properly select the predictions they need? If the problem is so hard they could transform it in a new tournament.

1 Like

I have stated before my doubts about the emphasis on some artificial ‘true contribution’, no matter how clever might be the constructs used to justify it. Anything that is not based on the actual predictions (correlation performance) is going to be sub-optimal.

While it is true that any model tends to learn more from the outliers, it is rarely in the right direction.

Corr is a simplification. Optimizing corr doesn’t necessarily mean optimizing fund performance.

I see a problem with the incentive system allowing compensation for inferior model performance with more stake.
Stake is just a signal. There’s no point to allow 10 times more weight than average if there’s no reason to believe that your model is 10 times better than average.

Don’t know how to fix this except by enforcing stake limits based on historical performance.


The theory is that more stake = more risk = more confidence = it must be better?

It sounds stupid when you put it like that, but it isn’t a totally bankrupt theory. Assuming historical performance of a model (or of the modeller) can’t be known (by the fund at least) – which was always the dream, to be able to use signals that any person came and added to the mix – what else can be done? Everything else involves some sort of gatekeeping system of proving yourself, etc – i.e. you move towards normal hedge fund operation, not really crowd-sourcing (basically you’re hiring people then, but with a more open audition system).

Something that can be done within current system is enforcing limits to weight in the metamodel, but NOT based on performance, just enforced period – nobody should have too much weight no matter what. (More accurately, redundant signals shouldn’t have too much aggregate weight as you can get around limits on people or model slots.) Surprisingly, they have always blown off questions about this when it seems so obviously a potential problem.

But still, I keep pointing this out. YOU ARE GOING TO HAVE DRAWDOWNS NO MATTER WHAT. Not every drawdown is an emergency situation calling for a sudden rug pull like we’ve just had. (Even if the change is ultimately good, implementing it as a rug pull where all your work is trashed without compensation is not. You destroy good will, etc etc) What I see going on now is panic taking over – “getting caught in the switches”. I would bet money that recently at Numerai they’ve been asking “well…what thing that if we had been doing that instead of what we were doing…would have gotten us through this last drawdown period doing good/ok instead of bad?” It sounds reasonable on the face of it – let’s just tweak things so that we would have done good in this recent past period if we had done this tweak sooner…but it is the road to ruin, ask any gambler.


(note: this is account leaderboard data, last week’s data, so numbers may differ a bit if you want to check now)
I take stake x TC as a measure of total influence on fund performance.

If stake weighting worked, one would expect to see some positive correlation between stake and influence.
Zoom in to lower stakes:

The density plots look similar to the above for accounts staked up to 8k.
Above 8k, the plots diverge you get results depending on which of the few big accounts get included.

To me, sparsity of the point cloud looks like the real issue here.
When expanding into sparse territory the metamodel might actually pick up more noise than signal.
Another reason to limit stakes and have participants not move too far away from the crowd.

Also note that the range of (Stake x TC) is nearly constant between 2k and 8k and how it explodes after that. That means increased dependence on fewer models. Not really desirable.

1 Like

So this is stake-weighted TC numbers and (current) stake as shown on leaderboard? Not sure that really will capture what has gone on round-to-round over time. Interesting nonetheless.

It’s account (DS) leaderboard data, as per edited text.
It’s about the big picture. Not round to round.

Agree, the stake signal becomes less useful when there are big differences in the amount staked by accounts. Just as is seen in the Signals tournament.

How about a within-account stake weighted MetaModel. This would give a more precise signal of the beliefs of the individual model. If an account distributes its overall staked amount 90/10 on two models, it clearly indicates his beliefs between those two models.

At the same time, the account weighting could be based on the ranking from the account-leaderboad. This would give higher weights to accounts that have proved their worth.

E.g., with some arbitrary numbers:
An account is ranked in the top 20 of the Account-Leaderboard and thereby has an overall weighting of 0.02 on his account. And he has two models (Model A and Model B), which he has distributed his total stakes to 90/10 on A and B. Then the MM weights from each model would be:
Model A = 0.02 * 0.9 = 0.18
Model B = 0.02 * 0.1 = 0.002

This system might be more robust to large stakers and emphasize accounts that have already proven their worth.

With datasets (and even metrics) changing all the time, what good is proving your worth? I mean, what have you proven? (Putting aside the whole fooled by randomness issue with any such scheme based on track record, which is a huge issue.) Is it assumed if you are good at one thing you are good at everything that comes along?

They used to actually not accept predictions that didn’t have a threshold minimum correlation with the examples – they just said “those tend to not be good models”.


Well, changes to metrics would also alter the leaderboard and thereby automatically change the account weighting to those metrics that the fund believes are most useful.

And the account leaderboard, in my opinion, gives a good indication of how well individual people have performed during the past year, which definitely has been in a changing environment. So they might also be good at handling new changes to eg. the dataset. And if they are not, they will decrease in account weight by falling in the account ranking, just as the case is now by burning their model stakes.
So this would also be a system that ‘optimize’ itself, just as the one we have now with the stake weighting.

However, these are all just thoughts

Yeah, but you wouldn’t have been optimizing for those – they might not even have existed before (probably didn’t). So then we get rug pulled because we didn’t do well on a metric in the past that didn’t even exist in that past. This has got to work for the participants somehow as well…

Think we are talking about two different things. I’m talking about the MM weights and how to maybe optimize the MetaModel.

I don’t see how changing the MM weight system will rug-pull anyone since they don’t affect payouts (or at least to a minimal amount). The only thing that might be affected is the MM performance, and the fund should obviously only do this if it could improve MM performance

Well, they are talking about restricting staking, i.e. keeping the link between staking and MM control, but controlling who gets to stake and how much. So you do bad, and you don’t get to stake seems to be the likely result of any gatekeeping scheme. In any case, I think they will always want to keep the link between MM control and potential rewards – if they’ve downgraded you in the MM weights due to some rule change, you can bet your payoffs are going with it. Which is fine, but not overnight.

Here’s the larger thought about all of this that’s happening. Numerai has a plan. Anybody finds it easy to follow a plan when things are going good. If the first thing you do when something goes badly is change the plan…THEN YOU NEVER HAD A PLAN. The whole point of a plan is so you know what to do when things go wrong – you stick to the plan. That’s the most important reason for having a plan. If you make changes, you do them holistically and because you realize the plan is flawed – something going badly in a probabilistic game like this isn’t in and of itself evidence of a flawed plan. Again, it is the entire reason for the plan because things are ABSOLUTELY GUARANTEED to go badly sometimes – the plan is what gets you through.

And if you find that you have a flawed plan and there are necessary changes to be made…well first of all you don’t blame the people who were just following the plan you set-up. You apologize, you say we’ve made a mistake, we recognize that this plan will never work (or this part of it, whatever), and its got to change. But its all our fault, we’ll try to make good any damage we’re doing, and we’ll start from there and think it over very carefully. That’s not what I see happening.


This is true for most individuals, but not for big investors/institutions.
Big investors stake big, because they have a lot of money to allocate.
Their stake is not proportionate to the quality of their model.

It can’t be! There are some very smart inidiviuals here, submitting great models.
Bigger investors/instituitions can’t be 10x better, but they can have 100x more money to allocate.


Also worth noting the potential effect of un-staking some big accounts for bad performance.
If they sell their stake, NMR could collapse and then it takes the whole ecosystem with them!

Unstaking big accounts or many small bad bad performing accounts means the end of the game.
NMR is weak anyway. All cryptos are. Such an event would be the final one.


Yes, I agree. The biggest stakers often just want a return…maybe any return because they are really just NMR hodlrs or whatever. But even if they are top-class models, as you say they still can’t be exponentially better than everybody else, so I think some sort of anti-top-heaviness clamp on staking (or something equivalent) would be appropriate. The stakes tend to conform to a power law distribution but the quality of the models don’t. Its the distribution of the stakes/mm control that needs to be adjusted to be more like the other.

1 Like