True Contribution Details

Alignment between the performance of tournament participants and hedge fund profitability is a key element in the construction of Numerai. If a model is ranked at the top of the Numerai leaderboard it should be because it is helping to improve the profitability of the hedge fund the most. Currently, users are evaluated only at the signal level: how well their signal correlates with the target (CORR) and their contribution to the Meta Model signal (MMC). However, Numerai’s portfolio is created by running our custom optimizer on the Meta Model signal. The optimizer enforces constraints and penalties on the portfolio that affect which aspects of the Meta Model signal are reflected in the final portfolio. This can create divergence between what appears to be a good model at the signal level and a model that is truly helping the fund create better portfolios.

For example, the optimizer penalizes feature exposure and thus large feature exposures in the Meta Model signal will not be reflected in the final portfolio. A user with a high feature exposure model may get great correlation with the target (for a while), but their signal will have limited influence on the portfolio since the feature exposure of the portfolio is constrained. Such a user could earn large payouts without ever contributing much information to the portfolio. This is obviously undesirable.

To better align our evaluations of users and the hedge fund performance we are introducing a new metric we call “True Contribution”. The goal of this metric is to estimate how much a user’s signal improves or detracts from the returns of Numerai’s portfolio. By using this metric for payouts, user incentives and hedge fund performance are in perfect alignment. With True Contribution as the payout metric, a user’s stake would increase if their model increased portfolio returns and decrease (burn) if the model reduced returns.

In our first first pass creating True Contribution we calculated the stake weighted Meta Model by leaving each user out in turn used the production optimizer to generate the corresponding portfolios, calculated the returns and then compare to the full stake weighted Meta Model in order to calculate “True Contribution”. There are a few problems with this formulation:

  1. A user’s contribution is then heavily dependent upon their stake and identical signals with the same stake get different scores
  2. Because users with 0 stake would always have 0 contribution there is no way to calculate the metric for unstaked users
  3. Users with small stakes would always have ~0 contribution
  4. Because the production optimizer starts from our current portfolio and enforces turnover constraints, the TC scores are heavily dependent on our past portfolios which users have no knowledge of or control over

Our latest version of TC fixes all these issues while retaining the realism of portfolio construction and returns. To do this, first we realized that the leave-one-user-out method is really just approximating a gradient calculation. What we really want is a quantification of how changing a user’s stake changes the portfolio returns, which is the gradient of portfolio returns with respect to users’ stakes. A true gradient calculation would also have the nice properties that 1) it can be computed for all users simultaneously from a single portfolio optimization rather than computing a separate optimization for each user held out and 2) it will assign the same values to identical signals with different stakes 3) it will assign proper values to 0 stakes. This first property is important for our AWS bills while the second and third properties are important for fairness in the tournament.

But performing a true gradient calculation would require taking a derivative through our portfolio optimizer, which is impossible, right? Actually, no! This seemingly magical feat can be accomplished quite simply using cvxpylayers. This remarkable package based on this award winning 2019 research paper by Agrawal et al. allows you to include a cvxpy defined convex optimization as a layer in a PyTorch model. Below is our fully differentiable PyTorch module for calculating a portfolio from user predictions and stakes using a simple Linear layer and our cvxpy based optimizer.

class SWMModel(nn.Module):
    # Simple end-to-end portfolio model
    def __init__(self, num_stakes, context, optimizer):
        self.optimizer = optimizer
        self.context = context

        # set initial portfolio to 0
        self.context.current_portfolio[:] = 0
        # stake weighted Meta Model as a Linear layer
        self.lin1 = nn.Linear(num_stakes, 1, bias=False)

    def forward(self, user_predictions):
        # calculate stake weighted Meta Model signal
        x1 = self.lin1(user_predictions)

        xin = cp.Parameter(x1.shape)

        # get cvxpy problem from optimizer
        self.context.alpha_scores = xin
        self.optimizer._build_optimization_routine(self.context.current_portfolio, self.context, True)
        problem = self.optimizer._optimization_routine
        assert problem.is_dpp()
        # insert cvxpy problem into a CvxpyLayer
        cvxpylayer = CvxpyLayer(problem, parameters=[xin], variables=problem.variables())

        # solve the problem using output of swmm as input to cvxpylayer
        solution = cvxpylayer(x1, solver_args={"max_iters": 1500})
        out = solution[0] - solution[1]

        return out, x1

We can use this module to calculate portfolio returns and the gradient of the portfolio returns with respect to stakes as follows:

swmm = SWMModel(len(stakes), context=context, optimizer=n1_optimizer)

# set weights of linear layer to be user stakes


# get optimized portfolio and swmm signal
swmm_port, swmm_signal = swmm(user_preds)

# calculate portfolio returns and then stake gradient wrt returns
portfolio_returns = swmm_port.T @ stock_returns

# calculate gradient

# extract gradients from Linear stake weighting layer
stake_grads = swmm.lin1.weight.grad.numpy().copy()

To regularize this gradient, reduce the effect of stake size, and reduce dependencies between user predictions we can perform dropout on the user stakes (i.e. randomly zero-out 50% of the stakes) before calculating the stake weighted Meta Model and calculating the gradients. To calculate our final TC estimate we perform 100 rounds of dropout and then average the gradients across the 100 rounds:

for i in range(100):
    print(f'bag {i}', end='\r')
    # set stakes with dropout, .5)
    # get optimized portfolio and unoptimized signal
    swmm_port, swmm_signal = swmm(user_preds)

    # calculate portfolio returns and then stake gradient wrt returns
    portfolio_returns = swmm_port.T @ stock_returns

This process gives very stable estimates that are 99.5% correlated across repeated trials with different dropout masks. The regularization also doesn’t produce results that are vastly different from the unregularized gradient, they are in fact about 90% correlated. While perhaps not absolutely necessary, we feel this regularization helps with the fairness and robustness of the metric, especially given that in reality models are dropping in and out of Numerai’s Meta Model all the time.

Taking a proper gradient solves the first three problems with our initial formulation. To address the fourth problem of making True Contribution independent of our current portfolio holdings, we can create a modified version of our optimizer where we remove the turnover constraint and allow the optimizer a full trading budget to find the optimal portfolio given the Meta Model signal. This generates a hypothetical but realistic portfolio which satisfies all the constraints of the optimizer. While this modified optimizer won’t produce the real portfolio we actually trade, the portfolio it does produce is a realistic reflection of how the Meta Model signal interacts with the portfolio optimizer and its various constraints and penalties.

Hopefully you find this formulation of TC as compelling as we do. In any case you are probably wondering what existing metrics best correspond to TC. To get a better sense of the relationship we can fit a model to predict TC scores from other metrics. A good choice for building flexible and interpretable models is the Explainable Boosting Machine (EBM). The EBM fits a generalized additive model (GAM) with 2-way interactions. The EBM is tree based like standard Gradient Boosting Machines (e.g. XGBoost, LightGBM) but is restricted to fit only GAMs. In the GAM formulation each variable (and interaction) gets its own learned function and these are all additively combined. To interpret the model you can compare importance scores and visualize the learned functions for each variable. A good proxy metric for TC would have both a high importance score and a monotonic relationship to TC. For this analysis I fit a model predicting TC from various metrics for rounds 272-300. Obviously this can only show us what TC has historically been related to and is no guarantee of what can happen in the future as user change their models. But caveats aside, let’s see what we find:

We see that far and away the best proxy is FNCv3, that is a prediction’s correlation with the target after prediction has been neutralized to the 420 features in the “medium” feature set (it will be formally announced later this week!). This measures how much alpha your signal has that isn’t linearly explained by the features. FNCv3 also shows a nice monotonic relationship to TC. (The bit of jaggedness in the functions is just overfitting and can be removed by tuning the EBM hyperparameters. The general trend is pretty obvious.)

The next best proxy is the interaction between FNCv3 and “Exposure Dissimilarity”. The “Exposure Dissimilarity” is a simple metric to compare a model’s pattern of feature exposure to the example predictions. The basic idea is that a signal containing information not already in the example predictions is likely to have a very different pattern of feature exposures. To calculate Exposure Dissimilarity:

  1. Calculate the correlation of a user’s prediction and the example prediction with each of the features to form two vectors U and E.
  2. Take the dot product of U and E divided by the dot product of E with E. This measures how similar the pattern of exposures are and is normalized to be 1 if U is identical to E.
  3. Subtract from 1 to form a dissimilarity metric where 0 means the same exposure pattern as example predictions, positive values indicate differing patterns of exposure and negative values indicate similar patterns but even higher exposures. Note that models with 0 feature exposure will have a dissimilarity value of 1.

Exposure Dissimilarity: 1 - U•E/E•E

By itself, Exposure Dissimilarity doesn’t explain TC, but the combination with FNCv3 in a multiplicative interaction is the next best proxy for TC. (This interaction was included explicitly because in preliminary analysis the EBM kept finding what looked like strong multiplicative interaction between these variables.) This interaction term also makes intuitive sense: TC rewards signals that are both unique and that contain feature independent alpha. This interaction term also bears a strong monotonic relationship to TC.

The next most important metric is the venerable MMC, which also shows a strong monotonic relationship to TC.

This is followed by the correlation of the top/bottom 200 elements of the feature neutralized prediction with the target, i.e. FNCv3 TB 200. This metric also shows a strong monotonic relationship to TC that is in addition to the FNC relationship. Indeed, if this metric had no additional useful information the function would not appear fairly cleanly monotonic, as we will see with CORR. This shows that good performance in the tails is also important for explaining TC.

The next most important metric is Maximum Exposure. While this metric doesn’t strongly influence TC, as you can see by the comparably small dynamic range of the function on the Y-axis, the interesting thing in this plot is that TC seems most associated with small, but nonzero maximum feature exposures. The optimal range for max feature exposures seems to be in [0.05, 0.30].

The final metric we will discuss is CORR. As you can see from the plot below the relationship between CORR and TC has small dynamic range and is notably non-monotonic. I want to emphasize that if it was only CORR in the EBM’s input, we would see an apparent monotonic relationship to TC. On average higher CORR is associated with higher TC, but when the other metrics are included they more cleanly explain TC and leave CORR with little additional variance to account for.

As you can see from the above, TC seems to capture the properties we have long recommended for user models to possess: predictive power that isn’t too dependent on single features, predictive power in the tails, uniqueness. To help everyone out, I made a follow up post demonstrating methods for directly optimizing metrics like FNC and TB200. Judging by the models doing the best at TC, some of you have been listening closely and have figured a lot of things out already :wink:

To maximize backward compatibility while maximizing the impact of TC, starting April 9th users will be able to stake on (0x or 1x CORR) + (0x or 1x or 2x TC). Staking on MMC will be automatically discontinued on that date. So if you are currently staking on 1x CORR and 2x MMC, your stake will be 1x CORR only starting April 9th unless you also elect to stake on 1x TC or 2x TC. Numerai will not automatically convert any MMC stakes to TC stakes. TC staking will start as opt-in only. There will be no changes to the payout factor for the time being.


Are we also keeping the payout factor for the time being (say, until staking on CORR is no longer available)?

1 Like

This is a great filtering mechanism. Who’s dropping out? Assuming CORR will be disappearing soonish, 5037 models have a TC score higher than 0.0, of those 1890 have a score greater than 0.01, 638 have a score higher than 0.02 and so on and so forth. My highest ranked model by TC is 243rd.

I estimate my returns will be halved. Wondering what the effect on the value of NMR will be. My guess is there will be a mass exodus (i know some competitors have multiple models…) and its value will plummet.

incentive to develop a better model perhaps but not if the value of NMR plumments.

Hmmm. badly thought out rant over.

1 Like

You make it sound there is no choice but to keep submitting the same models and suffer, or else quit. You can instead submit models more suitable for scoring on TC, which is where the incentive will be so I’m sure that’s what people will do. My staked models – which do ok under current payout scheme – all suck on TC. Doesn’t bother me a bit – I didn’t build them to be good on TC. Frankly, I can’t wait to get rid of them and bet on some more interesting stuff instead. I expect my earning rate to go up.


Point taken. i will be reducing my stake for the foreseeable future. until i come up with or don’t come up with a better model.

1 Like

Of course if a bunch of people pull their stakes then the payout factor goes up so that helps too (if you’re scoring positively).

1 Like

very positively, for now anyway.

Which would be the behaviour if a model is stacked two times? It will reduce its TC? And if so, it will depend of the relative amount of the stack of both models or will be independent of the relative size of them?

1 Like

Identical signals get identical TC regardless of stake. Theoretically a signal could get a negative TC if it is overstaked, but empirically the distribution of people’s gradients when their stake is zeroed out by dropout vs when their stake is kept are indistinguishable.

1 Like

any chance you can enable 0.5xTC ? :slight_smile:

1 Like

I am wondering how many people will reduce their stake in april, and what the overall impact will be on the metamodel. There must be models who rely heavily on MMC and suddenly that option is gone. I hoped for some kind of transition period.


Thanks for the write up @mdo

when doing this dropout, is it a random 50% of total stake value, or 50% of the total number of discrete stakes?

is the users own stake always zeroed?

And introduce an abs(TC) too please…


what the overall impact will be on the metamodel

Looking forward to higher PF :grinning:

1 Like

So beautiful! The core idea sounds weirdly obvious in hindsight, but undoubtably tough to come up with and implement.

Since stake size plays a significant role before regularization, will models with large stakes (1+% of metamodel) have more volatile metrics on TC or even be at a disadvantage? Or does the regularization reduce almost all effect of stake size?

Would you recommend large stakers to spread out the stake over more (diverse) models when optimizing for TC?

I hope we can have these metrics’ importance w.r.t. TC in every resolved round to see the dynamics. I am curious about the new TC stacking options, how will it affect mmc importance as the meta-model may become more different from the example model. I am still not sure if i should optimize for mmc directly by the example prediction.

I am very excited for these changes. Even if there is a slight dip in NMR value, this is a big step forward for the hedge fund which we all want and need to stick around for a long time :). Also from a profitability stand point, I appreciate that the barrier to entry is rising and that it is strategy related (feature engineering and modeling to optimize TC) not necessarily hardware dependent (looking at you supermassive dataset).

I am wondering though if the team has done any testing and experimentation with the validation set and optimizing TC? Was the above analysis done inclusive of validation eras?

I have mixed feelings about the current validation set for Corr and MMC and so wondering if there are any related changes or improvements down the pipe in this regard?

I think I answered this here: Question on TC: Is it True Contribution or something else? - #3 by mdo

1 Like

no doubt in the end this will happen :slight_smile:

@mdo Assume someone stake 100 NMR , what’s the differences between stake on (0 x CORR+2xTC) and (1xCORR+2xTC) ?