Staking NMR early in the round - any benefits?


Are there any potential benefits in staking NMR early, i.e. if I have a good model already at the beginning of the round? Isn’t it better for me to see what other people stake first, before making my staking? And therefore wait until closer to the end of the round? I can always get ahead of someone by making my confidence a tiny bit higher. Therefore, I don’t see why would I stake now, soon after round opening… From the other point of view, it’s good for Numerai to receive the stakes early… I just don’t see the incentives for me personally to stake early. Any ideas?


We had a big discussion about this in the slack channel recently. We all agree with you, there seems to be this race at the beginning of the round to get originality and a race at the end for the staking. Numerai are aware of this and they are trying to find some ways to get this resolved. Exactly how or what they are going to do is not clear yet.

If you have any ideas, the slack channel is the place to voice your options.


Thanks for your reply. I understand the problem with originality the people with simple models have, so it’s a rush to submit early. However, I personally don’t have an originality problem. You’re right, it’s not clear what to do. Numerai perhaps should also wait for some of the good models to be submitted. But 6 days is a bit much to wait. Anyway, thanks.


Has anyone suggested hiding the conf. submission from other stakers?


That has been suggested, but everything is recorded on the blockchain, so it’s available for anyone to see. There might be some way of encrypting the confidence, but that will complicate things a great deal.


That makes sense. In my opinion, I think a much bigger issue is solving (removing) the originality measure. Truth be told, my best models never get accepted and I always have to do something goofy to make sure I am satisfying the snowflake award for creativity. I am starting to think that is maliciously imposing originality to get many more models from me than I would normally submit, because I am certainly submitting many more models than I should be. If this is the case, then smart move by, but things like that rarely work out long term.

One idea I had is to split the competition. Have one pool with no originality (and if you want, none of the other stuff, too, but I understand they can be useful to sort out junk) and one normal pool. The original pool will function normally, but the unrestricted pool will likely be a lot more competitive, because there is a likely a reason so many models looks similar to one another (its because they’re good models).

I also messed around with some scores a few weeks ago (5 weeks of scores) and saw that consistency and log loss were negatively correlated. That is, the less consistent models had on average lower log loss. This might have been a fluke, especially since I cant think of a mathy reason that explains it, but it is evidence against the notion of those metrics (consistency, concordance, etc.) creating better models overall.


There is likely a reason so many models look similar, because they’re simple models and/or they’re all capitalizing on the same signal in the data and provide similar predictions. Having a bunch of the same model isn’t really helpful to Numerai’s meta model, they want diverse models, that’s why they have the originality requirement.

And it makes sense that log loss and consistency would be negatively correlated. You get better consistency by being conservative with your estimates because log loss more heavily penalizes incorrect predictions the farther they are from correct. So shrinking all of your predictions towards 0.5 will generally improve your consistency (up to a point) but hurt your log loss.


This is what I was thinking too.

To your point about simple models, I wholeheartedly agree. Simple logistic models (likely everyone’s first entries), even with some randomization, are going to give really similar results. But it is unbelievably frustrating to have even more sophisticated, validated, data scienced models being rejected b/c of originality. It is a classic game theory race, so likely there is no good solution. I am not involved in this discussion in depth, so I wonder if there is a real world example of a situation similar to this where a solution could be ported over?


Which could also be a byproduct of having a relatively low signal after it’s encrypted.


Financial data has very low signal to noise ratio generally, at least for the signals that can be exploited after costs. For example, 50.5% success rate for guessing whether something will go up or down is very good.