…and I think that’s bad for everyone.
Numerai’s uniqueness relies on the fact that they have a thousand data scientists work for them to predict the stock market, while the average hedge fund might have a few dozen. But lately, I’ve been wondering whether a thousand data scientists who compete against each other are better than a dozen who collaborate. Because, unfortunately, MMC disincentivises collaboration.
Numerai’s stated goal for the original tournament is to maximise the correlation between the meta-model predictions and the targets. To achieve this, Numerai would like to ensemble predictions from their participants that
- have a high correlation with the targets (call this “goal 1”) and
- are independent from each other (“goal 2”).
Note that the more goal 1 is fulfilled, the less important goal 2 becomes; if one user submits “perfect” predictions, Numerai would actually like every user to submit the same—completely dependent—predictions.
To achieve their goal(s), Numerai try to devise a tournament payout scheme which makes participants internalise these goals. To facilitate an internalisation of goal 1, participants receive a return on their investment equal to their predictions’ correlation with the targets (“CORR”). To facilitate an internalisation of goal 2 is more intricate, since independence can only be defined in relationship to other participants and is thus a moving target. Numerai are currently trying to achieve this internalisation by giving participants the possibility to receive a return on their investment equal to (a multiple of) their predictions’ correlation with the targets after the former have been orthogonalised to the meta-model’s predictions (“MMC”). Note that MMC reflects both goal 1 and 2 simultaneously.
If we are in a world of participants who are completely isolated, the current payout scheme should work reasonably well. One issue is that participants cannot properly train their models for MMC. This makes it hard for them to internalise goal 2. A (flawed) remedy could be to publish historical meta-model predictions to use in training. Another issue is the opaqueness and volatility of MMC. A more radical remedy for this problem would be an overhaul of the payout scheme. For example, participants could receive a return equal to CORR x IND x c, where IND = (1 - CORR W METAMODEL) and c is some scalar. In my experience, correlation with the metamodel is much more stable than MMC and could thus help participants focus on goal 2.
However, we are in a world in which people are interconnected and constantly learn from each other. This is clear when you consider the exchange of thoughts, ideas, and memes that takes place within the active Numerai community here and especially on RocketChat. It has been quite amazing to me how much I was able to learn from all the smart and creative people who spend their valuable time explaining established and new ML concepts to rookies like me. But this poses a problem to Numerai’s current payout scheme: Neither CORR nor MMC make participants internalise the positive effects their sharing of knowledge and ideas has on other participants’ models’ correlation and independence. In fact, MMC introduces an incentive to harm the meta-model’s performance. (It is true that too much sharing of ideas might harm overall model-independence for the sake of correlation. However, some participants might have developed sophisticated procedures to make models more independent, but do not want to share them to not hurt their MMC score. Moreover, as overall correlation goes up, independence becomes less important. Lastly, the same idea, in the hands of different people, can lead to very different results.)
I have discovered multiple little tricks that have helped my models’ performance, a lot of which have not been discussed publicly. Surely, so has anyone else who spends enough time in this tournament. I believe that at least some helpful approaches have not been made public because of the antagonistic nature of MMC (and of the leaderboard bonus before it). This certainly has lead to models with lower correlation and has thus hampered goal 1. Moreover, if participants do not share their great ideas to enjoy their MMC gains in peace, the public discussion can become dominated by ideas pushed by the more vocal Numerai team—think feature “neutralisation”. In the end, this can lead to a scarcity of heterogeneous ideas being considered by the average participant. But this can make models less independent, hampering goal 2 as well.
I don’t know what a payout scheme that makes participants perfectly internalise Numerai’s goals looks like. This post has been written to point out some positive externalities that so far seem to have not been considered. I do have some related thoughts/suggestions on which others might further improve, though:
- One option is to incentivise “productive” forum posts—ideally through NMR payments. Of course, this can lead to an overinflation of posts and comments, and we don’t want to end up like Kaggle, do we?
- I still wonder whether the CORR x IND x c payout scheme could work. I have not thought about it deeply, but on the surface I like that it makes a clear distinction between goal 1 and goal 2. I guess it doesn’t solve the problem that helping other participants is disincentivised, though—it’s just less of a disincentive than MMC.
- Maybe all Numerai really needs is CORR and they should just get rid of MMC. Participants cannot really train for meta-model independence, anyway. They might already have an incentive to train models which are independent from each other if they stake on more than one model, however. And even without an explicit incentive, I believe that humans like to try out their own, idiosyncratic approaches just for the sake of it, facilitating independence naturally.
- Why not align Numerai’s and the participants’ goals more directly and make the return dependent on the correlation of the meta-model’s predictions with the targets (“MMCORR”)? To prevent free-riding, return could be dependent on something like CORR x MMCORR x c or CORR x a + MMCORR x b.
I’m not saying that Numerai participants don’t collaborate—quite on the contrary, as I have already stated above. But I believe that a lot of the “big” ideas have been held back, and that the current payment scheme incentivises this secrecy. This harms both Numerai as well as the participants.