Numerai Fireside Chat Aftermath

No, it is not. I clearly said that I mean: past performance does not indicate future performance. Totally different statement.

Example: you see a model with really good historical performance. You buy it on numerbay and stake on it. The model will now burn 50% of your stake, because:

  1. The author of the model decided to submit something else since you staked (or has a bug in his pipeline)
  2. The numerai fund raised AUM significantly and now is not able to trade small caps; but the model you have bought is trained on an old target, not optimized for large caps
  3. …

The point is - you base your stake on something, that already happened in history, but your profits are purely based on what happens in the future and you are missing the extra context needed. And the historical performance was based on factors which just did not persist but you don’t know that. That’s why just using the historical performance is never enough.

4 Likes

But that actually proves my point :slight_smile: My argument was that selecting models using just LB rank does not work and it is a stroke of pure luck.

If that ever worked, Numerai would not need to introduce the staking in the first place. It was like this till June 2017 and it apparently did not work.

  1. It is not really my problem how Numerai selects their MM weights. If they are unhappy with linear system, they can use whatever works best for them.

  2. You don’t know, maybe without the big stakers the fund performance could be much worse. Everyone talking about big stakers but nobody here proved that average stakers delivered something much better. Considering how bad LightGBM models did in the drawdown I would not expect anything great.

Better than considering the past performance of a model, I would say users should submit the prediction of the current round plus some other eras (synthetic eras or historical eras with a different obfuscation settings so that past eras cannot be recognized). This additional eras serve to gain confidence on the model over different period of times. Let’s say this additional eras are 10 per round, then over a period of 4 weeks (a full round) Numerai can collect the model performance over 220 eras (20 rounds + 10 test eras * 20 rounds). This would certainly improve the confidence on models.

It used to work like this before March 2022 in the pre-v4 dataset era. But they moved away from this setup for multiple reasons.

But using historical eras would not solve the drawdown we had. These models worked great in historical eras, just not in the following ones.

Using synthetic eras could work, I agree. But not sure how difficult is to craft them in an useful way.

2 Likes

It used to work like this before March 2022 in the pre-v4 dataset era. But they moved away from this setup for multiple reasons.

Do you think they changed the extra eras every round or they used the same?

Also they didn’t compute the score on the additional eras, that’s a problem

  1. it was a fixed hold out few years long
  2. nope (it would allow a ladder climbing attack)

That would be the case only for a fixed hold out era set, correct? So my original idea would still work and I still believe it would make any type of score more meaningful and less noisy. The only problem i see in my approach is the increase in computation: if Numerai asks for X more era submissions, then the computation requirements increase X times and given the limited amount of time we have for the daily tournament that might be a problem.

On a different note, I believe the stake weighted portfolio concept has to go away (as many of you already said). It was an interesting idea but it bundles two concepts that have nothing to do with each other: the optimal model weight (optimal from the hedge fund point of view) and the user investment capacity. To be honest, the whole Numeraire thing has to go away. It’s an additional layer that users don’t need/want. I hope Numerai can find the way to make the best use of the model predictions so that the hedge fund goes well and we can get paid in FIAT eventually.

I’m worried more about total signal weight/mm control than staking (or avoiding drawdowns) per se, i.e. I’m in the camp that thinks a tiny number of signals shouldn’t be essentially controlling the fund. Why have all these thousands of models when only 10 of them really count?

4 Likes

I think it could perhaps work with synthetic eras, but hardly with historical eras. Where would you get so many historical eras that they never wear out considering the whole history is now public?

About NMR - the fund needs it and I want it :slight_smile: What are users going to stake … fiat? Also, the whole point of NMR is that payouts don’t cost them anything. If they had to pay with fiat, the earnings would be negligible…

Fair enough, but nobody forces Numerai to calculate the weights in the way they do now (weight == stake). They can use something else (like log(stake)) if this does not work for them. It is an internal thing.

Also, the concentration of MM weights can be caused by Numerbay as well (too many participants staking on the same model)…

Yes, exactly. SOMETHING should be done about too much top-heaviness – it doesn’t have to be staking restrictions. Although I’d listen to such arguments as one part of a solution – should there really be zero top limit if the average stake is X whatever [ factors of magnitude less ]. Earlier in the tournament we had a single guy (who worked for Numerai and designed the scoring!) dominating the staking in a huge way. Giant whales dominating (and reducing the payout factor) does impact the participation (in a negative way) of everybody else and that should be recognized. But it is a thorny problem, with no perfect solutions. I’d like to keep the crowd in crowd-sourcing though. (And I tend to think any schemes based on historical performance will be more problematic than beneficial which I think we agree on.)

1 Like

Another thought: we already have de facto staking restrictions with the payout factor (which take the form of earning restrictions on your stake…it’s similar anyway). Still, one mega whale can eat the whole pie if they choose to. I keep coming back to some thought of tying payout factor to CWMM for everybody – then whales will be disincentivized from becoming the MM. They can still stake a ton, but not all on the same/similar signal. And those with high-performing but low-CWMM models can earn at better rates. Doesn’t that sound about right? Or is it attackable?

1 Like

I think the current stake burning and TC are theoretically enough to address most problems such as big whales and bad performing models by auto-correction (i.e. burning). But the problem is that the burning is too slow for participant who choose low multiplier for TC. From what i have seen, most big stakers’ TC multiplier is small. So the auto-correction is slow or even non-existing if TC is set to 0 (during period where TC is negative but Corr is good).
So i think the stake weighted ensemble is not ideal if ppl can choose different multipliers. Maybe the ensemble weights should be based something like a accumulated ā€œvirtual stakeā€ that is calculated using fixed corr + 3*tc (the optimal multipliers should be researched and determined by numerai so that it will have a better auto-correction speed, higher multipliers should have faster auto-correct speed but maybe higher churn so more research should be done. It can even be a moving average or some sort).
Alos, i think the ā€œvirtual stakeā€ can be used as an actual staking limit factor as it is somehow related to the accumulated actual TC of the MM. So, if you have a very high virtual stake, then naturally you can stake more.

2 Likes

Here is one of the example on how to implement ā€œvirtual stakeā€ (VS) system:

  1. We can first initialize the VS to be the current stake
  2. Then we calculate the virtual payout (VP) each round normally using aCorr + bTC, a and b are determined by numerai
  3. Participant’s stake level cannot exceed accumulated ā€œvirtual stakeā€ (AVS) or avs_factor * AVS, where avs_factor is set by numerai.
  4. Now at least you will not be burnt by TC if you choose not to, but your round to round stake limit will be changed based on your AVS. So, stake that exceed AVS will be return to your wallet while stake that do not exceed AVS will be compound as usual.

The switch to 2xCorr+1xTC is a huge pay cut. Naturally this will stifle innovation/contribution.

Also, it will take longer to for high TC stakes to grow in significance to the MM and longer for low TC stakes to diminish in significance to the MM.

Flat out: the new changes reek of ā€œdownturn panicā€.

First the optimizer… now the incentive structure… What next?

There shouldn’t be any stake limit for users, who EARN their NMR. They are good data scientists, their big stake improve the meta model.

Stake limit should apply only for those who BUY their stake. They may or may not be good…

1 Like

If they are good scientists with good models, then the ā€œvirtual stakeā€ should be basically unlimited for them. I think the main problem here is that they can chose to just use 2xCorr + 0 * TC, then they are not burnt properly based on their contribution (Here i assume TC can actually measure their contribution correctly, which i think it does to some extent).

Many ideas have emerged; the following are particularly resonant with me:

Moving from linear stake MM weights to weights using factors such as CWMM, MCWNM, and submitter’s selected payout multiplier(s). This would help keep multiple voices in the conversation and avoid ā€œwhale emphasisā€. MCWNM values equal to 1.0 signal that this a voice that’s already been heard, and that someone is just repeating someone else’s (or their own) predictions.

Having a performance-based payout multiplier that starts at 1 (aka no change to payouts) for new users but can go up or down based on daily returns. Consistent positive returns and it crawls upward, vice versa for burns. No limit on upward growth. This incentivizes people to find net positive models and run with them. They could have a 2.0 multiplier rather quickly. Note this multiplier could also be used in calculating MM stake weighting contributions as it is a measure of historical confidence.

Awards/multipliers that incentivize low CWMM and MCWNM (and CWEP–corr with example preds) but only when there’s high/positive CORR as well.

Monthly/annual NMR bonuses for steady performance.

2 Likes

I have realized that there is something I still don’t understand. I have my ideas, but I would like to ask if anybody has a better understanding of the Numerai’s point of view.

The bad performance of the hedge fund depends largely to the large stakes on models that performed bad lately, despite the tournament having many models with good performance in the same period of time (but with smaller stakes).

However Numerai still likes the idea of the Stake Weighted Meta Model, so they keep this approach and instead they have temporarily change the payout scheme from a maximum of 1xCorr+3xTC to 2xCorr+1xTC

  • How would that solve the problem? Didn’t the large stake models performed equally bad on both CORR and TC?

  • Since they are putting more weight on CORR instead of TC, that does mean TC is not useful for the hedge fund after all?