MMC Calculation "Punishes" Originality During Meta-Model Burns

I’m using the actual formula for MMC calculation. Basically, in the MMC formula, you do the following steps:

  1. Transform all predictions to standard uniform
  2. Neutralize against the meta-model
  3. Transform neutralized predictions to standard uniform
  4. Compute MMC as the covariance between neutralized predictions and the target, divided by 0.29^2 (variance of standard uniform) to get to correlation space

This is the formula from MMC2 Announcement

The MMC is not influenced in any way by how the meta model performs in combination with specific user predictions. It is influenced by the overall meta model performance and the user’s exposure (correlation) to the meta model, as seen in the above steps.

Now, using the above formula when the meta model raw corr is -1% with two user models:
Model A: raw corr -1%, meta model corr 0.99 (so a meta model clone)
Model B: raw corr -1%, meta model corr 0 (same performance, but completely uncorrelated)

You would get roughly MMC model A score of 0, MMC model B score of -1%.
So originality gets ‘punished’, as you get much lower MMC with the uncorrelated model.

Again, this only happens when the meta model has negative correlation with the target.

You could think about it like this: model A has 0.99 exposure to the meta model, which this round is a negative feature/predictor. By removing the meta model from model A, the MMC for model A is increased just because we remove a negative corr exposure. But model B is not correlated to the meta model, so neutralization doesn’t change anything. It doesn’t “benefit” from the negative performance of the meta model.

In good periods, model B really benefits from MMC. Using the above MMC formula when the meta model raw corr is 1% with two user models:
Model A: raw corr 1%, meta model corr 0.99 (so a meta model clone)
Model B: raw corr 1%, meta model corr 0 (same performance, but completely uncorrelated)

You would roughly get scores of around 0% MMC for model A, but around 1% MMC for model B.
So total CORR+MMC scores are 1% for model A (basically no originality = 0 MMC) and 2% for model B (originality makes MMC=CORR leading to twice the payout of just CORR).

Regarding Model A, I totally agree:
It is almost a MM clone, it does not add anything to MM, so 0% MMC. OK

Regarding Model B:
You say it is completely uncorrelated with MM, but you give numbers that shows perfect correlation:
MM CORR = -1% ==> Model B CORR = -1%
MM CORR = 1% ==> Model B CORR = 1%
If you use this numbers:
MM CORR = -1% ==> Model B CORR = -1%
MM CORR = 1% ==> Model B CORR = -1%
you get MMC = -1%, as it should be, because its contribution is negative to the MM performance.

Regarding originality:
You can do something very original and very bad, or something very original and very good. In your example, B is an original and bad model. Originality should pays if it contributes in the right direction.

Sorry, maybe I wasn’t clear.
Model B has stock market correlation of -1%
Meta model has stock market correlation of -1%
Model B has correlation with meta model of 0%.
These 3 things are completely separate numbers/notions.

You could get a model with the same corr on the stock market as the meta model, but around 0% correlation with the meta model itself. Please do no confuse the 3 correlation types.

I’m not saying that originality should be rewarded when it’s bad.
I’m saying that it gets punished “more” than a meta model clone. Which shouldn’t be the case as this overall diminishes the incentives of MMC, by basically saying that during meta model burns, meta model clones are “safer”.

Perhaps the problem is more like MMC punishes meta model clones less during meta-model burns, than original models.

I have seen actual models during this round that have higher positive corr on target, lower meta-model correlation, but lower MMC than models with lower corr (even negative) on target, but higher meta-model correlation.
As stated, these models would probably have higher MMC during good periods, but why should they have lower MMC during bad periods?

If you run my example code, the two models have the same positive corr on target (around 2.4%) on target, but because the meta model has around - 5% (so really negative) corr on target, the MMC for the original model is negative while the MMC for the 95% meta model correlation model is very high and positive.

Basically, we remove the very bad signal (the meta model in this case) from the meta model clone and we end up with a huge MMC. For the -63% meta model correlation model (so a completely original, even opposite model to the meta model), we get really negative MMC. Why? Because we are basically adding the meta model to the original model (negative times negative equal plus), as per the neutralize_series code from the official forum post.

I am not confusing them, I agree that case is possible. So, if you say: “Model B has correlation with meta model of 0%”, your example should have numbers according to that hypothesis. If B and MM are uncorrelated, good periods for MM should not perfectly overlap with good periods for B.
Translating this to your example: even if the two scenarios that you describe were posible, and you got those outcomes, they are not representative of the average outcome in the long run. Maybe you get that contradictory results in 1 or two rounds, but not on average.
I am not sure but maybe the problem is that the numbers you have included in your simulation are not satisfying all the hypothesis you have made.

The toy code example was just for illustrative purposes, but it’s a miniature of a possible real world
scenario.

If you look at and understand the way the neutralization function works, you do not need actual numbers.

Aren’t you glossing over the part where your predictions are neutralized with the metamodel and so then the residuals are compared to the targets? You make it sound as if only your single-number correlation score matters to this process and not the actual interactions of your specific predictions and the metamodel and targets. (i.e. not everybody with the same overall correlation to the targets and also the same correlation to the metamodel will end up with the same MMC, right? The actual predictions do matter.)

Anyway, here’s an interesting case. For round 218 I submitted a couple of unstaked (1-p) models to my normal models to see if the MMC results would be exactly symmetrical. (They are.) But the effect was interesting with one of them. (218 was probably a metamodel burn, right?)

The result for “wigglemuse” (one of my normal models) was a burn:

CORR_TARGETS: -0.0151
CORR_META: +0.6902
MMC: +0.0027

So I’ve got a slight positive MMC for a model that is relatively correlated with metamodel (but nothing close to a clone) but did slightly better than the metamodel (judging by my rank and scores by integration_test,etc) but still negative to targets. So that’s fine.

And so the (1-p) model (“smoghovian”) ended up with the exact opposites:
CORR_TARGETS: +0.0151
CORR_META: -0.6902
MMC: -0.0027

So the opposite – at least if you looked at it in isolation and weren’t thinking about it flipped – might leave you scratching your head. Here I was highly negatively correlated with the metamodel, did WAY better on correlation to the targets, and yet have a slight negative MMC. But of course it is completely symmetrical with the other opposite model ( as expected) although some disagree that any positive corr should get a negative MMC.

I think the disconnect comes as many want to be additionally rewarded/punished for essentially the same thing as CORR is measuring if they do it better or worse than others. (In which case we’d just make MMC = percentile corr rank) But MMC is actually on a totally different axis (or can be) than your CORR scores, and is its own number. Which doesn’t necessarily make it a good or useful number (it doesn’t necessarily not either), but we’ve seen lots of attempts to judge it by its correlation to other things rather than just what it is actually trying to measure.

Thank you for proving my point!

Your two models are a perfect example of what I’m trying to explain.

Indeed 218 was most likely a meta model burn round.

The intuitive explanation (not following the actual formula!) is something like this: your wigglemuse model had negative corr, but mmc is approximately (again not actual official formula) something like wigglemuse_corr - wigglemuse_exposure*mm_corr = -0.0151 - (0.6902) * (-0.0307 integration test) = 0.0061 (which is not super far from 0.0027). If I use mm score of -0.026 I get 0.0028 MMC (which is more realistic as MM should beat integration test)

The same can be applied to smoghovian: MMC = +0.0151 - (-0.6902) * (-0.0307) = -0.0061, or -0.0028 if I use again -0.026.

Now the question is do we want this sort of scenario?

If wigglemuse had CORR_META of 1 and smoghovian CORR_META of -1, the difference in MMC would be even more extreme, in favor of wigglemuse. However, wigglemuse had a really negative raw corr while smoghovian had a really good positive corr, so why are we rewarding wigglemuse? Negative corr and kinda correlated to the meta model…
While an opposite, original model with highly positive corr gets negative (?!) mmc?

And again it would be even more extreme if the corrs with the meta model were 1 and -1.

Regarding the neutralization, I was merely offering a simple explanation. In my example code I used the exact formula and steps for MMC and the official neutralization function from the official forum post.

The whole scenario comes from the neutralization function…

I don’t have a strong feeling about it one way or another at this time. As far as why are we rewarding wigglemuse in this case, well I say would we are not! Especially since now we get CORR or CORR+MMC but never just MMC, I’d actually much rather have wigglemuse’s burn be somewhat offset by a positive MMC than get rid of the slight ding to smoghovian’s overall positive score. wigglemuse is still negative and smoghovian still positive as far as payouts go.

I am suspicious of the appropriateness of the neutralization function as it is and certainly am not opposed to further improvements to the MMC calculation. But I just don’t have the mathematical grounding to really make a good argument on that front so I’ll have to leave that to others. But most arguments seem to want to converge MMC towards CORR, but if they are too alike then we don’t need them both. I have no problem for instance with MMC being negative at times that CORR is positive and vice-versa, etc etc. I think what you are pointing out is interesting but I’m not yet convinced it is a travesty of justice. And I tend not to like “fiddly” solutions that single out special cases and try to adjust them to “better” outcomes – if that is needed we need to get to the root cause of the problem and come up with a total elegant solution that doesn’t rely on exceptions and special adjustments. At least that’s the ideal. If we are going to tear it down, let’s tear it all the way down and then build it up again a better way.

It’s not a special case. This sort of scenario will always happen during burn periods due to the way the neutralization function is set up.

wigglemuse is technically rewarded with mmc. When I was mentioning rewards etc, I was 100% referring to numerai’s rewards for originality. From my point of view, corr is not a reward per-se as that is simply raw performance on the target for which anybody can earn a decent number with the example_predictions. MMC on the other hand is numerai’s incentive to be original. To deviate as much as possible from the example predictions (and meta model).

From a payout perspective sure it’s less important, but I’m seeing this as a problem for numerai’s plan of rewarding originality.

I certainly do no want for MMC to converge towards CORR.
I am merely stating the fact that MMC “fails” as an incentive to be original when the meta model is negative. As stated before, a simple, basic explanation of the neutralization is that it “removes” the meta model. However, what happens when you remove a negative stock market corr feature from a model that had a positive exposure to that feature?

I am also 100% ok with MMC to be negative even with positive CORR and vice-versa, but only when it should.

E.g. positive corr but highly correlated with example predictions and maybe corr is even less than the example predictions corr -> negative MMC.

Or negative corr, but low correlation to example predictions and maybe even beating the example predictions -> positive MMC.

However, as your models proved, you can have positive CORR, negative exposure to the meta model, significantly outperforming the example predictions (and meta model too probably) and have negative MMC. This I can’t really see as an acceptable scenario…

And the problem is only on the rounds with negative meta model, otherwise I believe MMC is really well constructed. On positive rounds, it removes the meta model performance (proportionately to the user’s exposure) and computes the corr again (with covariance), very elegant and straightforward…

It would only need a bit of a fix for negative rounds…

It is of course MMC – MetaModel Contribution, not Originality Score (for which we could use simple uncorrelatedness to metamodel).

So we can see how the wigglemuse model deserved a slightly positive MMC for helping the metamodel to be less bad during a burn period. But smoghovian should have also done that even more, right? (Both wigglemuse and its opposite smoghovian were superior to the metamodel corr-wise as far as we can reasonably guess.) So the question for the team is how is it possible (or the better question: how is it reasonable and acceptable) for the smoghovian model with its way better target corr and negative mm corr to actually be hurting the metamodel? Is it really, or it is a mathematical quirk of the neutralization process and negative metamodel performance that round? I guess I’d like to hear from the defense at this point…

It’s the way the neutralization function is set-up.

My understanding is this: the neutralization function is actually just a slightly modified linear regression model.

user_predictions = a x mean_of_user_predictions + b x meta_model_predictions

  1. a and b are the coefficients:

Code: np.linalg.lstsq(exposures, scores)
exposures are the meta_model_predictions concatenated with the mean_of_user_predictions
scores are the user_predictions

  1. mean_of_user_predictions is just a vector filled with the mean of the users predictions, basically instead of using 1 we use the mean of the user predictions as the intercept column
  2. meta_model_predictions are the meta model

after estimating this model:

neutralized_user_predictions = user_predictions - (a x mean_of_user_predictions + b x meta_model_predictions)

Code: corrected_scores = scores - correction

So the neutralized user predictions are just the residuals from this linear model.

Perhaps this makes it clearer…
For a MM clone, b is going to be large and positive (near 1), leading to something like this:
user_predictions (e.g. 1% raw corr) - b * meta_model_predictions (some negative corr, e.g. -3%) =
neutralized_user_predictions -> higher corr, probably something like 4% (1 - - 3%) if b around 1

For an original model, b is going to be small, e.g. 0 for 0% correlation to meta model
user_predictions (e.g. 1% raw corr) - b * meta_model_predictions (some negative corr, e.g. -3%) =
neutralized_user_predictions -> 1% raw corr as b = 0

I think @sdmlm has done a very good job of identifying one the problems with MMC. It seems so clear to me I was compelled to write something, which I will just paste here.

1 Like

Hi @lackofintelligence, thanks a lot for this! Indeed this is a very nice mathematical presentation of the problem. The discussion also evolved further on the feedback channel on the chat. Hopefully they’ll address it soon, especially if meta model burns will become more frequent.

For example, if somehow the meta model had overall 0% corr on target across rounds and 50% rounds would be burns, then mmc overall doesn’t reward (nor punish) originality in any way on average. That is why I believe this sort of behaviour is detrimental to the incentive plan of rewarding originality (combined with performace of course)…

How about make a sort of confusion matrix with all the basic negative/positive possibilities of a model relatively closely related to the metamodel and one with much lower mm correlation, and what same situations would look like under proposed solution?

That is easy to see by taking the average of the derived formula for MMC. The average MMC:

<MMC> = <CORR> - <U, M> x <M, T>

When the average correlation of the meta model to the truth is zero, <M, T> is zero, the 2nd term in the equation is zero so <MMC> = <CORR>; originality, which can be represented by the inverse of <U, M>, has no impact on the score in that case.

Yes that is exactly the case. And once the correlation of the meta model to the truth becomes negative, then negative times negative = plus, so the more correlated with the meta model, the higher the score.

Regarding an alternative to the current MMC, as a first step, there needs to be some consensus on what characteristics MMC should have.

My opinion:

  1. Transparency: no black box approach as I believe the community should understand how they are scored and have the opportunity to perform their own analysis like this forum post. The current MMC mostly satisfies this.

  2. All other things being equal, MMC should always increase with a decrease in corr to the meta model. E.g. two models, same exact corr on target, the model with a lower corr with the meta model should always have higher MMC. The current MMC formula does not satisfy this during negative meta model rounds as less corr with meta model decreases MMC.

  3. Simplicity: the MMC should not have an overly complicated methodology, in order to have as many people properly understand how their being scored. If users understand how they’re being scored for originality (and relative performance), they’ll have an easier time actually working towards that goal. The current MMC only somewhat satisfies (or doesn’t satisfy) this as I got the impression most people have a hard time understanding the neutralization function and the underlying mathematics behind it. This is probably also one of the reasons why the main problem with the current MMC hadn’t been identified until now.

  4. (Similar to 2) MMC should always be a monotone increasing function of: originality and relative performance to the meta model. By originality I mean correlation to the meta model (original model = low correlation to the meta model). By relative performance I mean how much better or worse is the user’s model vs the meta model. MMC should of course increase with lower correlation with the meta model, but it should also decrease when the user’s predictions underperform the meta model. An original model shouldn’t be rewarded if it has very very bad relative performance. The current MMC does not satisfy this as during negative meta model rounds it punishes originality, i.e. it is not monotonically increasing in originality.

So this is not really what we want. Take a case where the metamodel is in a burn period, scoring -0.05. Model A submits random noise, and scores -0.01. Model B submits a model that is strictly better than the metamodel, but very similar to the metamodel, scoring -0.04.

What you’re saying is we should reward this model A random noise because it looks unique and it did better than the metamodel. But it gave 100% unique information, and the result was negative. It should be punished. It made the metamodel strictly worse than it was before. Model B however made the metamodel strictly better, so it should be rewarded.

Looking only at correlation_score and correlation_w_metamodel can be very misleading in this way. The current MMC approach corrects this though, and only rewards for real added unique information.

The reason we see some models with low correlation get hurt worse in burn rounds is because they are actually not helpful signals in most cases :frowning:

MMC is very difficult… you don’t play it just because you can submit a unique signal. Anyone can do that. You play it because you have something valuable that is currently either absent or underrepresented in the metamodel.

Still, your code example highlights a different part of MMC design. It seems like this is really a case of it being a 1-P model, and that information gets removed by neutralization, just as submitting the metamodel itself would. You could also form a similar example where you submit exactly a negative version of the metamodel, and boom you have something SUPER ORIGINAL (-1 corr with metamodel), and also scored +0.05 instead of -0.05! In your encoded example, that unique model was actually worse than simply submitting a negative metamodel in its entirety. But that’s not really original, it’s just market timing with a signal we already have. I can see some argument for rewarding that, but it’s not what we wanted to accomplish with MMC. Corr will already reward that timing ability proportionally to its value.

2 Likes

I am a bit confused by this. How did model B make the metamodel strictly better? The current MMC formula doesn’t implement about how the metamodel is affected by the user’s predictions. Also, why are you saying model A is random noise? How would you know that? Based on this premise, then we should punish model A even when the meta model is positive, but then why even care about originality? Also, this is an extreme example/case, realistically most models are within 0.3 to 0.95 corr with the meta model, so they are def not random noise. Yet in negative meta model rounds, the more you go towards 0.3 corr with the meta model, the lower your MMC scores becomes.

This is also confusing. The current MMC does not say anything about how the meta model behaves in relation to a user’s model. It only cares about how a user’s model behaves in relation to the meta model (very different things). Again, when the meta model is negative, the more unique you are the lower your MMC score. It doesn’t have to be 0 or negative corr with the meta model.

My code example was just to illustrate the MMC formula and it’s behavior under negative meta model rounds. There are plenty of real model examples (not 1-p etc) on this forum and the chat where simply having lower (not negative just a bit lower) corr with the meta model but higher corr on target leads to a lower MMC score during meta model negative rounds.