MMC Calculation "Punishes" Originality During Meta-Model Burns

Thank you for proving my point!

Your two models are a perfect example of what I’m trying to explain.

Indeed 218 was most likely a meta model burn round.

The intuitive explanation (not following the actual formula!) is something like this: your wigglemuse model had negative corr, but mmc is approximately (again not actual official formula) something like wigglemuse_corr - wigglemuse_exposure*mm_corr = -0.0151 - (0.6902) * (-0.0307 integration test) = 0.0061 (which is not super far from 0.0027). If I use mm score of -0.026 I get 0.0028 MMC (which is more realistic as MM should beat integration test)

The same can be applied to smoghovian: MMC = +0.0151 - (-0.6902) * (-0.0307) = -0.0061, or -0.0028 if I use again -0.026.

Now the question is do we want this sort of scenario?

If wigglemuse had CORR_META of 1 and smoghovian CORR_META of -1, the difference in MMC would be even more extreme, in favor of wigglemuse. However, wigglemuse had a really negative raw corr while smoghovian had a really good positive corr, so why are we rewarding wigglemuse? Negative corr and kinda correlated to the meta model…
While an opposite, original model with highly positive corr gets negative (?!) mmc?

And again it would be even more extreme if the corrs with the meta model were 1 and -1.

Regarding the neutralization, I was merely offering a simple explanation. In my example code I used the exact formula and steps for MMC and the official neutralization function from the official forum post.

The whole scenario comes from the neutralization function…

I don’t have a strong feeling about it one way or another at this time. As far as why are we rewarding wigglemuse in this case, well I say would we are not! Especially since now we get CORR or CORR+MMC but never just MMC, I’d actually much rather have wigglemuse’s burn be somewhat offset by a positive MMC than get rid of the slight ding to smoghovian’s overall positive score. wigglemuse is still negative and smoghovian still positive as far as payouts go.

I am suspicious of the appropriateness of the neutralization function as it is and certainly am not opposed to further improvements to the MMC calculation. But I just don’t have the mathematical grounding to really make a good argument on that front so I’ll have to leave that to others. But most arguments seem to want to converge MMC towards CORR, but if they are too alike then we don’t need them both. I have no problem for instance with MMC being negative at times that CORR is positive and vice-versa, etc etc. I think what you are pointing out is interesting but I’m not yet convinced it is a travesty of justice. And I tend not to like “fiddly” solutions that single out special cases and try to adjust them to “better” outcomes – if that is needed we need to get to the root cause of the problem and come up with a total elegant solution that doesn’t rely on exceptions and special adjustments. At least that’s the ideal. If we are going to tear it down, let’s tear it all the way down and then build it up again a better way.

It’s not a special case. This sort of scenario will always happen during burn periods due to the way the neutralization function is set up.

wigglemuse is technically rewarded with mmc. When I was mentioning rewards etc, I was 100% referring to numerai’s rewards for originality. From my point of view, corr is not a reward per-se as that is simply raw performance on the target for which anybody can earn a decent number with the example_predictions. MMC on the other hand is numerai’s incentive to be original. To deviate as much as possible from the example predictions (and meta model).

From a payout perspective sure it’s less important, but I’m seeing this as a problem for numerai’s plan of rewarding originality.

I certainly do no want for MMC to converge towards CORR.
I am merely stating the fact that MMC “fails” as an incentive to be original when the meta model is negative. As stated before, a simple, basic explanation of the neutralization is that it “removes” the meta model. However, what happens when you remove a negative stock market corr feature from a model that had a positive exposure to that feature?

I am also 100% ok with MMC to be negative even with positive CORR and vice-versa, but only when it should.

E.g. positive corr but highly correlated with example predictions and maybe corr is even less than the example predictions corr -> negative MMC.

Or negative corr, but low correlation to example predictions and maybe even beating the example predictions -> positive MMC.

However, as your models proved, you can have positive CORR, negative exposure to the meta model, significantly outperforming the example predictions (and meta model too probably) and have negative MMC. This I can’t really see as an acceptable scenario…

And the problem is only on the rounds with negative meta model, otherwise I believe MMC is really well constructed. On positive rounds, it removes the meta model performance (proportionately to the user’s exposure) and computes the corr again (with covariance), very elegant and straightforward…

It would only need a bit of a fix for negative rounds…

It is of course MMC – MetaModel Contribution, not Originality Score (for which we could use simple uncorrelatedness to metamodel).

So we can see how the wigglemuse model deserved a slightly positive MMC for helping the metamodel to be less bad during a burn period. But smoghovian should have also done that even more, right? (Both wigglemuse and its opposite smoghovian were superior to the metamodel corr-wise as far as we can reasonably guess.) So the question for the team is how is it possible (or the better question: how is it reasonable and acceptable) for the smoghovian model with its way better target corr and negative mm corr to actually be hurting the metamodel? Is it really, or it is a mathematical quirk of the neutralization process and negative metamodel performance that round? I guess I’d like to hear from the defense at this point…

It’s the way the neutralization function is set-up.

My understanding is this: the neutralization function is actually just a slightly modified linear regression model.

user_predictions = a x mean_of_user_predictions + b x meta_model_predictions

  1. a and b are the coefficients:

Code: np.linalg.lstsq(exposures, scores)
exposures are the meta_model_predictions concatenated with the mean_of_user_predictions
scores are the user_predictions

  1. mean_of_user_predictions is just a vector filled with the mean of the users predictions, basically instead of using 1 we use the mean of the user predictions as the intercept column
  2. meta_model_predictions are the meta model

after estimating this model:

neutralized_user_predictions = user_predictions - (a x mean_of_user_predictions + b x meta_model_predictions)

Code: corrected_scores = scores - correction

So the neutralized user predictions are just the residuals from this linear model.

Perhaps this makes it clearer…
For a MM clone, b is going to be large and positive (near 1), leading to something like this:
user_predictions (e.g. 1% raw corr) - b * meta_model_predictions (some negative corr, e.g. -3%) =
neutralized_user_predictions -> higher corr, probably something like 4% (1 - - 3%) if b around 1

For an original model, b is going to be small, e.g. 0 for 0% correlation to meta model
user_predictions (e.g. 1% raw corr) - b * meta_model_predictions (some negative corr, e.g. -3%) =
neutralized_user_predictions -> 1% raw corr as b = 0

I think @sdmlm has done a very good job of identifying one the problems with MMC. It seems so clear to me I was compelled to write something, which I will just paste here.

1 Like

Hi @lackofintelligence, thanks a lot for this! Indeed this is a very nice mathematical presentation of the problem. The discussion also evolved further on the feedback channel on the chat. Hopefully they’ll address it soon, especially if meta model burns will become more frequent.

For example, if somehow the meta model had overall 0% corr on target across rounds and 50% rounds would be burns, then mmc overall doesn’t reward (nor punish) originality in any way on average. That is why I believe this sort of behaviour is detrimental to the incentive plan of rewarding originality (combined with performace of course)…

How about make a sort of confusion matrix with all the basic negative/positive possibilities of a model relatively closely related to the metamodel and one with much lower mm correlation, and what same situations would look like under proposed solution?

That is easy to see by taking the average of the derived formula for MMC. The average MMC:

<MMC> = <CORR> - <U, M> x <M, T>

When the average correlation of the meta model to the truth is zero, <M, T> is zero, the 2nd term in the equation is zero so <MMC> = <CORR>; originality, which can be represented by the inverse of <U, M>, has no impact on the score in that case.

Yes that is exactly the case. And once the correlation of the meta model to the truth becomes negative, then negative times negative = plus, so the more correlated with the meta model, the higher the score.

Regarding an alternative to the current MMC, as a first step, there needs to be some consensus on what characteristics MMC should have.

My opinion:

  1. Transparency: no black box approach as I believe the community should understand how they are scored and have the opportunity to perform their own analysis like this forum post. The current MMC mostly satisfies this.

  2. All other things being equal, MMC should always increase with a decrease in corr to the meta model. E.g. two models, same exact corr on target, the model with a lower corr with the meta model should always have higher MMC. The current MMC formula does not satisfy this during negative meta model rounds as less corr with meta model decreases MMC.

  3. Simplicity: the MMC should not have an overly complicated methodology, in order to have as many people properly understand how their being scored. If users understand how they’re being scored for originality (and relative performance), they’ll have an easier time actually working towards that goal. The current MMC only somewhat satisfies (or doesn’t satisfy) this as I got the impression most people have a hard time understanding the neutralization function and the underlying mathematics behind it. This is probably also one of the reasons why the main problem with the current MMC hadn’t been identified until now.

  4. (Similar to 2) MMC should always be a monotone increasing function of: originality and relative performance to the meta model. By originality I mean correlation to the meta model (original model = low correlation to the meta model). By relative performance I mean how much better or worse is the user’s model vs the meta model. MMC should of course increase with lower correlation with the meta model, but it should also decrease when the user’s predictions underperform the meta model. An original model shouldn’t be rewarded if it has very very bad relative performance. The current MMC does not satisfy this as during negative meta model rounds it punishes originality, i.e. it is not monotonically increasing in originality.

So this is not really what we want. Take a case where the metamodel is in a burn period, scoring -0.05. Model A submits random noise, and scores -0.01. Model B submits a model that is strictly better than the metamodel, but very similar to the metamodel, scoring -0.04.

What you’re saying is we should reward this model A random noise because it looks unique and it did better than the metamodel. But it gave 100% unique information, and the result was negative. It should be punished. It made the metamodel strictly worse than it was before. Model B however made the metamodel strictly better, so it should be rewarded.

Looking only at correlation_score and correlation_w_metamodel can be very misleading in this way. The current MMC approach corrects this though, and only rewards for real added unique information.

The reason we see some models with low correlation get hurt worse in burn rounds is because they are actually not helpful signals in most cases :frowning:

MMC is very difficult… you don’t play it just because you can submit a unique signal. Anyone can do that. You play it because you have something valuable that is currently either absent or underrepresented in the metamodel.

Still, your code example highlights a different part of MMC design. It seems like this is really a case of it being a 1-P model, and that information gets removed by neutralization, just as submitting the metamodel itself would. You could also form a similar example where you submit exactly a negative version of the metamodel, and boom you have something SUPER ORIGINAL (-1 corr with metamodel), and also scored +0.05 instead of -0.05! In your encoded example, that unique model was actually worse than simply submitting a negative metamodel in its entirety. But that’s not really original, it’s just market timing with a signal we already have. I can see some argument for rewarding that, but it’s not what we wanted to accomplish with MMC. Corr will already reward that timing ability proportionally to its value.


I am a bit confused by this. How did model B make the metamodel strictly better? The current MMC formula doesn’t implement about how the metamodel is affected by the user’s predictions. Also, why are you saying model A is random noise? How would you know that? Based on this premise, then we should punish model A even when the meta model is positive, but then why even care about originality? Also, this is an extreme example/case, realistically most models are within 0.3 to 0.95 corr with the meta model, so they are def not random noise. Yet in negative meta model rounds, the more you go towards 0.3 corr with the meta model, the lower your MMC scores becomes.

This is also confusing. The current MMC does not say anything about how the meta model behaves in relation to a user’s model. It only cares about how a user’s model behaves in relation to the meta model (very different things). Again, when the meta model is negative, the more unique you are the lower your MMC score. It doesn’t have to be 0 or negative corr with the meta model.

My code example was just to illustrate the MMC formula and it’s behavior under negative meta model rounds. There are plenty of real model examples (not 1-p etc) on this forum and the chat where simply having lower (not negative just a bit lower) corr with the meta model but higher corr on target leads to a lower MMC score during meta model negative rounds.

What you’re saying is we should reward this model A random noise because it looks unique and it did better than the metamodel. But it gave 100% unique information, and the result was negative. It should be punished. It made the metamodel strictly worse than it was before. Model B however made the metamodel strictly better, so it should be rewarded.

The only metric that you have for determining whether one model is strictly superior to another model is the correlation to the resolved scores. So if someone bets their bank on random noise, then it is because they believe that a strategy of random noise will be superior to any other strategy, especially in the context where we only report orderings and not magnitudes for our predictions, i.e., submitting random noise is the analogy of reducing the amplitude of predictions in the correlation to outcome context.

This just isn’t necessarily true. For examples where you can find users who have lower corr_w_metamodel and lower MMC, you can also find users with lower corr_w_metamodel and higher MMC. It just depends on which way that independent vector faces with respect to the target.

If a model is very unique, but it’s still burning at the same time as the metamodel, why would we reward that model? A key point of MMC for us is finding signals which burn at different times. If your model is uncorrelated but burns at the same time as the metamodel, then it means that all of your uniqueness was for nothing :frowning: . You just found a unique way to burn that no one else found. Not very helpful for us.

The problem with your analysis is you’re looking at the universe of users where
They are unique
AND there’s a burn period for the metamodel
AND the user still burned at the same time too.

The truth is that if you are unique, in general, you are more likely to be OK in a burn period. If you are burning at the same time while being unique, then it means you may have just managed to isolate a part of the metamodel responsible for the burning, and the rest of your signal didn’t save you.

Take this data for round 218 showing users who have roughly 30% corr with metamodel.

Then look at the data for round 218 showing users who have high corr with metamodel. Which group has better scores on average? Which group has higher MMC on average?

It should be clear that in this burn round, it was not necessarily better or worse to be unique. It only matters what type of uniqueness you have, whether it’s good uniqueness or bad uniqueness.

Being more unique definitely allows for higher magnitudes of MMC, in either direction. But your direction is still influenced by the quality of your signal with respect to the target.


This is not the problem here. The problem is that, during meta model burns, 2 models with the same corr on target, one with lower corr_w_metamodel and one with higher corr_w_metamodel, the one with lower corr_w_metamodel will have lower MMC.

There are numerous examples of models with positive corr on target and negative MMC during negative meta model rounds.

Again the user doesn’t have to burn for this to happen and they also don’t have to be unique, the problem is that MMC is an increasing function of corr with the meta model during negative meta model rounds.

The data you posted actually proves my point.

As shown by previous forum posts and by @lackofintelligence mathematical note, MMC = CORR_user - CORR_WITH_METAMODEL * CORR_META_MODEL_ON_TARGET. Looking at this it is clear that MMC will increase with CORR_user , but decrease with CORR_WITH_METAMODEL .

From the data you posted for round 218:
user sunkay: -1% corr_target, 91% corr with meta model, 1.59% MMC
user souryuu: 0.25% corr_target, 31% corr with meta model, 1.33% MMC

As you can see, we gave user sunkay a larger MMC even though user souryuu has much better, positive corr on target.

Again, it is mathematically proven that MMC = CORR_user - CORR_WITH_METAMODEL * CORR_META_MODEL_ON_TARGET. Meaning that the users with low CORR_WITH_METAMODEL from your example data got MMC due to higher CORR_user, which offset the loss from CORR_WITH_METAMODEL * CORR_META_MODEL_ON_TARGET.

In any case, I think I’ve basically exhausted all possible arguments/examples I could give.

In the end it is what it is.

I really enjoy this tournament and have really tried my absolute best to improve it by trying my hardest to identify and explain this problem and ways to fix it.

If other users want to contribute to this please feel free, but I don’t think there’s anything else I could say on the matter.


For this data, what you should be looking at is the differences between MMC and CORR in relation to the correlation with the meta model. Let’s pick two points and compare. For purple who has the most unique model shown, the difference between MMC and CORR = 0.057 - 0.042 = 0.015. For the least unique model shown, rehoboam, the difference is -0.00555 - -0.02777 = 0.02222. So rehoboam is rewarded much more solely for being more like the meta model, which makes no sense as usual, especially since purple has a much better model in this round. All of this data seems to follow this trend except for a couple of outliers that suggest some other kind of oddities in this data.


There is almost no sense to compare correlation with the meta model between users. You can split your predictions into two parts - most confident and least confident, reverse the latter one and as the result - significant drop in correlation with meta model and almost the same correlation with real target. And the MMC right now exactly represents a “useful uniqueness” your model have.

Another example for the situation discussed in the beginning of this post: Users 1-5 have unique models with -0.01 Corr and 0 corr with MM. MM is based on these 5 predictions and results in -0.05 Corr.
User 6 has a model which is staked models from User 1-4 with -0.04 Corr and 80% corr with MM.
In the current MMC calculation system, Users 1-5 will get negative MMC values and User 6 will get high positive MMC. And all of that just sounds right.


There was feedback in the private messages which forced me to clarify my statements.

I believe that we cannot use any live data results as any justification of the theory discussed above. My point is that correlation with MetaModel is a completely wrong metric for model uniqueness. There are 4 example models in the attached code (based on the sdmlm’s code). First one is MM with -0.055 CORR performance . Second one is a true unique model with -0.036 CORR performance, 41% corr with MM, -0.066 MMC and -0.03 MMC-CORR value. The third one is a noisy “unique” model with -0.03 CORR performance, 43% corr with MM, -0.15 MMC and -0.12 MMC-CORR value. By the term noisy model I mean here that correlation was reduced mostly using part of predictions which does not correlate with target values. And the last, fourth model is highly correlated which is basically a noisy “unique” model with less artificial shuffles of predictions which does not correlate with target values. It has the same -0.03 CORR, 65% corr with MM, -0.075 MMC and -0.044 MMC-CORR value. So, truly unique model have the highest MMC and MMC-CORR metrics despite the worst raw CORR performance. Highest correlation model has better MMC and MMC-CORR metrics than noisy unique model. Both results provide me with the feeling that current MMC implementation is totally fine. And this “So rehoboam is rewarded much more solely for being more like the meta model, which makes no sense as usual, especially since purple has a much better model in this round. All of this data seems to follow this trend except for a couple of outliers that suggest some other kind of oddities in this data.” can be just explained by most of “unique” models in live data are fulfilled with a noise except “the couple of outliers” which are truly unique and useful models.

The only thing which makes me feel slightly less confident in my example is using 0.3 value in low_meta_corr_model array. So, any comments about that are appreciated.


import numpy as np
import pandas
import scipy.stats

def spearmanr(target, pred):
    return np.corrcoef(
        pred.rank(pct=True, method="first")
    )[0, 1]

def neutralize_series(series, by, proportion=1.0):
   scores = series.values.reshape(-1, 1)
   exposures = by.values.reshape(-1, 1)

   # this line makes series neutral to a constant column so that it's centered and for sure gets corr 0 with exposures
   exposures = np.hstack(
       (exposures, np.array([np.mean(series)] * len(exposures)).reshape(-1, 1)))

   correction = proportion * (
       np.linalg.lstsq(exposures, scores)[0]))
   corrected_scores = scores - correction
   neutralized = pandas.Series(corrected_scores.ravel(), index=series.index)
   return neutralized

def _normalize_unif(df):
    X = (df.rank(method="first") - 0.5) / len(df)
    return scipy.stats.uniform.ppf(X)

target = pandas.Series([0, 0, 0, 0,
          0.25, 0.25, 0.25, 0.25,
          0.5, 0.5, 0.5, 0.5,
          0.75, 0.75, 0.75, 0.75,
          1, 1, 1, 1])

meta_model = pandas.Series([1, 0, 0.25,
              1, 0.5, 0.75,
              0.5, 0.5, 1,
              0.75, 0.75, 0,
              0, 0.25, 0.25,
              1, 0, 0.25, 0.5, 0.75])
high_meta_corr_model = pandas.Series([1, 0, 0.25,
              1, 0.75, 0.5,
              0.5, 0.5, 0.75,
              0.75, 1, 0,
              0, 0.25, 0.25,
              0.75, 0.5, 1.0, 0.0, 0.25])

low_meta_corr_model = pandas.Series([0.3, 0.5, 0.75,
              1, 0.5, 0.75,
              0, 0, 1,
              0.75, 0.75, 0,
              0.25, 0.25, 0.5,
              1, 1, 0, 0.25, 0.25])

low_meta_corr_model_noise = pandas.Series([0.25, 0, 1,
              1, 0.75, 0.5,
              0.5, 0.5, 0.75,
              0.75, 1, 0,
              0, 0.25, 0.25,
              0.75, 0.5, 1.0, 0.0, 0.25])
# Meta model has raw corr with target of -5.5% (burning period)
meta_model_raw_perf = spearmanr(target, meta_model)
print(f"Meta_model performance: {meta_model_raw_perf}")

high_meta_corr_raw_perf = spearmanr(target, high_meta_corr_model)
high_meta_corr = spearmanr(meta_model, high_meta_corr_model)

print(f"High_meta_corr model cross-corr: {spearmanr(meta_model, high_meta_corr_model)}")
print(f"High_meta_corr model performance: {high_meta_corr_raw_perf}")

low_meta_corr_raw_perf = spearmanr(target, low_meta_corr_model)
low_meta_corr = spearmanr(meta_model, low_meta_corr_model)

print(f"Low_meta_corr model cross-corr: {spearmanr(meta_model, low_meta_corr_model)}")
print(f"Low_meta_corr model performance: {low_meta_corr_raw_perf}")

low_meta_corr_raw_perf_noise = spearmanr(target, low_meta_corr_model_noise)
low_meta_corr_noise = spearmanr(meta_model, low_meta_corr_model_noise)

print(f"Low_meta_corr_noise model cross-corr: {spearmanr(meta_model, low_meta_corr_model_noise)}")
print(f"Low_meta_corr_noise model performance: {low_meta_corr_raw_perf_noise}")

# MMC Computation
# All series are already uniform

# Neutralize (using forum post code for neutralization)
neutralized_high_corr = neutralize_series(high_meta_corr_model, meta_model, proportion=1.0)
neutralized_low_corr = neutralize_series(low_meta_corr_model, meta_model, proportion=1.0)
neutralized_low_corr_noise = neutralize_series(low_meta_corr_model_noise, meta_model, proportion=1.0)

# Compute MMC
mmc_high_corr = np.cov(target, 

mmc_low_corr = np.cov(target, 

mmc_low_corr_noise = np.cov(target, 

print(f"MMC for non-original model: {mmc_high_corr}")
print(f"MMC for original model: {mmc_low_corr}")
print(f"MMC for original noise model: {mmc_low_corr_noise}")

print(f"MMC-CORR for non-original model: {mmc_high_corr-high_meta_corr_raw_perf}")
print(f"MMC-CORR for original model: {mmc_low_corr-low_meta_corr_raw_perf}")
print(f"MMC-CORR for original noise model: {mmc_low_corr_noise-low_meta_corr_raw_perf_noise}")

Meta_model performance: -0.05518254055364692
High_meta_corr model cross-corr: 0.6560590932489134
High_meta_corr model performance: -0.030656966974248294
Low_meta_corr model cross-corr: 0.4169347508497767
Low_meta_corr model performance: -0.03678836036909795
Low_meta_corr_noise model cross-corr: 0.4353289310343258
Low_meta_corr_noise model performance: -0.030656966974248308
MMC for non-original model: -0.07529413605357009
MMC for original model: -0.06688466111771676
MMC for original noise model: -0.15449965579823513
MMC-CORR for non-original model: -0.044637169079321797
MMC-CORR for original model: -0.030096300748618812
MMC-CORR for original noise model: -0.12384268882398683