MMC Calculation "Punishes" Originality During Meta-Model Burns

I am not confusing them, I agree that case is possible. So, if you say: “Model B has correlation with meta model of 0%”, your example should have numbers according to that hypothesis. If B and MM are uncorrelated, good periods for MM should not perfectly overlap with good periods for B.
Translating this to your example: even if the two scenarios that you describe were posible, and you got those outcomes, they are not representative of the average outcome in the long run. Maybe you get that contradictory results in 1 or two rounds, but not on average.
I am not sure but maybe the problem is that the numbers you have included in your simulation are not satisfying all the hypothesis you have made.

The toy code example was just for illustrative purposes, but it’s a miniature of a possible real world
scenario.

If you look at and understand the way the neutralization function works, you do not need actual numbers.

Aren’t you glossing over the part where your predictions are neutralized with the metamodel and so then the residuals are compared to the targets? You make it sound as if only your single-number correlation score matters to this process and not the actual interactions of your specific predictions and the metamodel and targets. (i.e. not everybody with the same overall correlation to the targets and also the same correlation to the metamodel will end up with the same MMC, right? The actual predictions do matter.)

Anyway, here’s an interesting case. For round 218 I submitted a couple of unstaked (1-p) models to my normal models to see if the MMC results would be exactly symmetrical. (They are.) But the effect was interesting with one of them. (218 was probably a metamodel burn, right?)

The result for “wigglemuse” (one of my normal models) was a burn:

CORR_TARGETS: -0.0151
CORR_META: +0.6902
MMC: +0.0027

So I’ve got a slight positive MMC for a model that is relatively correlated with metamodel (but nothing close to a clone) but did slightly better than the metamodel (judging by my rank and scores by integration_test,etc) but still negative to targets. So that’s fine.

And so the (1-p) model (“smoghovian”) ended up with the exact opposites:
CORR_TARGETS: +0.0151
CORR_META: -0.6902
MMC: -0.0027

So the opposite – at least if you looked at it in isolation and weren’t thinking about it flipped – might leave you scratching your head. Here I was highly negatively correlated with the metamodel, did WAY better on correlation to the targets, and yet have a slight negative MMC. But of course it is completely symmetrical with the other opposite model ( as expected) although some disagree that any positive corr should get a negative MMC.

I think the disconnect comes as many want to be additionally rewarded/punished for essentially the same thing as CORR is measuring if they do it better or worse than others. (In which case we’d just make MMC = percentile corr rank) But MMC is actually on a totally different axis (or can be) than your CORR scores, and is its own number. Which doesn’t necessarily make it a good or useful number (it doesn’t necessarily not either), but we’ve seen lots of attempts to judge it by its correlation to other things rather than just what it is actually trying to measure.

Thank you for proving my point!

Your two models are a perfect example of what I’m trying to explain.

Indeed 218 was most likely a meta model burn round.

The intuitive explanation (not following the actual formula!) is something like this: your wigglemuse model had negative corr, but mmc is approximately (again not actual official formula) something like wigglemuse_corr - wigglemuse_exposure*mm_corr = -0.0151 - (0.6902) * (-0.0307 integration test) = 0.0061 (which is not super far from 0.0027). If I use mm score of -0.026 I get 0.0028 MMC (which is more realistic as MM should beat integration test)

The same can be applied to smoghovian: MMC = +0.0151 - (-0.6902) * (-0.0307) = -0.0061, or -0.0028 if I use again -0.026.

Now the question is do we want this sort of scenario?

If wigglemuse had CORR_META of 1 and smoghovian CORR_META of -1, the difference in MMC would be even more extreme, in favor of wigglemuse. However, wigglemuse had a really negative raw corr while smoghovian had a really good positive corr, so why are we rewarding wigglemuse? Negative corr and kinda correlated to the meta model…
While an opposite, original model with highly positive corr gets negative (?!) mmc?

And again it would be even more extreme if the corrs with the meta model were 1 and -1.

Regarding the neutralization, I was merely offering a simple explanation. In my example code I used the exact formula and steps for MMC and the official neutralization function from the official forum post.

The whole scenario comes from the neutralization function…

I don’t have a strong feeling about it one way or another at this time. As far as why are we rewarding wigglemuse in this case, well I say would we are not! Especially since now we get CORR or CORR+MMC but never just MMC, I’d actually much rather have wigglemuse’s burn be somewhat offset by a positive MMC than get rid of the slight ding to smoghovian’s overall positive score. wigglemuse is still negative and smoghovian still positive as far as payouts go.

I am suspicious of the appropriateness of the neutralization function as it is and certainly am not opposed to further improvements to the MMC calculation. But I just don’t have the mathematical grounding to really make a good argument on that front so I’ll have to leave that to others. But most arguments seem to want to converge MMC towards CORR, but if they are too alike then we don’t need them both. I have no problem for instance with MMC being negative at times that CORR is positive and vice-versa, etc etc. I think what you are pointing out is interesting but I’m not yet convinced it is a travesty of justice. And I tend not to like “fiddly” solutions that single out special cases and try to adjust them to “better” outcomes – if that is needed we need to get to the root cause of the problem and come up with a total elegant solution that doesn’t rely on exceptions and special adjustments. At least that’s the ideal. If we are going to tear it down, let’s tear it all the way down and then build it up again a better way.

It’s not a special case. This sort of scenario will always happen during burn periods due to the way the neutralization function is set up.

wigglemuse is technically rewarded with mmc. When I was mentioning rewards etc, I was 100% referring to numerai’s rewards for originality. From my point of view, corr is not a reward per-se as that is simply raw performance on the target for which anybody can earn a decent number with the example_predictions. MMC on the other hand is numerai’s incentive to be original. To deviate as much as possible from the example predictions (and meta model).

From a payout perspective sure it’s less important, but I’m seeing this as a problem for numerai’s plan of rewarding originality.

I certainly do no want for MMC to converge towards CORR.
I am merely stating the fact that MMC “fails” as an incentive to be original when the meta model is negative. As stated before, a simple, basic explanation of the neutralization is that it “removes” the meta model. However, what happens when you remove a negative stock market corr feature from a model that had a positive exposure to that feature?

I am also 100% ok with MMC to be negative even with positive CORR and vice-versa, but only when it should.

E.g. positive corr but highly correlated with example predictions and maybe corr is even less than the example predictions corr -> negative MMC.

Or negative corr, but low correlation to example predictions and maybe even beating the example predictions -> positive MMC.

However, as your models proved, you can have positive CORR, negative exposure to the meta model, significantly outperforming the example predictions (and meta model too probably) and have negative MMC. This I can’t really see as an acceptable scenario…

And the problem is only on the rounds with negative meta model, otherwise I believe MMC is really well constructed. On positive rounds, it removes the meta model performance (proportionately to the user’s exposure) and computes the corr again (with covariance), very elegant and straightforward…

It would only need a bit of a fix for negative rounds…

It is of course MMC – MetaModel Contribution, not Originality Score (for which we could use simple uncorrelatedness to metamodel).

So we can see how the wigglemuse model deserved a slightly positive MMC for helping the metamodel to be less bad during a burn period. But smoghovian should have also done that even more, right? (Both wigglemuse and its opposite smoghovian were superior to the metamodel corr-wise as far as we can reasonably guess.) So the question for the team is how is it possible (or the better question: how is it reasonable and acceptable) for the smoghovian model with its way better target corr and negative mm corr to actually be hurting the metamodel? Is it really, or it is a mathematical quirk of the neutralization process and negative metamodel performance that round? I guess I’d like to hear from the defense at this point…

It’s the way the neutralization function is set-up.

My understanding is this: the neutralization function is actually just a slightly modified linear regression model.

user_predictions = a x mean_of_user_predictions + b x meta_model_predictions

  1. a and b are the coefficients:

Code: np.linalg.lstsq(exposures, scores)
exposures are the meta_model_predictions concatenated with the mean_of_user_predictions
scores are the user_predictions

  1. mean_of_user_predictions is just a vector filled with the mean of the users predictions, basically instead of using 1 we use the mean of the user predictions as the intercept column
  2. meta_model_predictions are the meta model

after estimating this model:

neutralized_user_predictions = user_predictions - (a x mean_of_user_predictions + b x meta_model_predictions)

Code: corrected_scores = scores - correction

So the neutralized user predictions are just the residuals from this linear model.

Perhaps this makes it clearer…
For a MM clone, b is going to be large and positive (near 1), leading to something like this:
user_predictions (e.g. 1% raw corr) - b * meta_model_predictions (some negative corr, e.g. -3%) =
neutralized_user_predictions -> higher corr, probably something like 4% (1 - - 3%) if b around 1

For an original model, b is going to be small, e.g. 0 for 0% correlation to meta model
user_predictions (e.g. 1% raw corr) - b * meta_model_predictions (some negative corr, e.g. -3%) =
neutralized_user_predictions -> 1% raw corr as b = 0

I think @sdmlm has done a very good job of identifying one the problems with MMC. It seems so clear to me I was compelled to write something, which I will just paste here.

1 Like

Hi @lackofintelligence, thanks a lot for this! Indeed this is a very nice mathematical presentation of the problem. The discussion also evolved further on the feedback channel on the chat. Hopefully they’ll address it soon, especially if meta model burns will become more frequent.

For example, if somehow the meta model had overall 0% corr on target across rounds and 50% rounds would be burns, then mmc overall doesn’t reward (nor punish) originality in any way on average. That is why I believe this sort of behaviour is detrimental to the incentive plan of rewarding originality (combined with performace of course)…

How about make a sort of confusion matrix with all the basic negative/positive possibilities of a model relatively closely related to the metamodel and one with much lower mm correlation, and what same situations would look like under proposed solution?

That is easy to see by taking the average of the derived formula for MMC. The average MMC:

<MMC> = <CORR> - <U, M> x <M, T>

When the average correlation of the meta model to the truth is zero, <M, T> is zero, the 2nd term in the equation is zero so <MMC> = <CORR>; originality, which can be represented by the inverse of <U, M>, has no impact on the score in that case.

Yes that is exactly the case. And once the correlation of the meta model to the truth becomes negative, then negative times negative = plus, so the more correlated with the meta model, the higher the score.

Regarding an alternative to the current MMC, as a first step, there needs to be some consensus on what characteristics MMC should have.

My opinion:

  1. Transparency: no black box approach as I believe the community should understand how they are scored and have the opportunity to perform their own analysis like this forum post. The current MMC mostly satisfies this.

  2. All other things being equal, MMC should always increase with a decrease in corr to the meta model. E.g. two models, same exact corr on target, the model with a lower corr with the meta model should always have higher MMC. The current MMC formula does not satisfy this during negative meta model rounds as less corr with meta model decreases MMC.

  3. Simplicity: the MMC should not have an overly complicated methodology, in order to have as many people properly understand how their being scored. If users understand how they’re being scored for originality (and relative performance), they’ll have an easier time actually working towards that goal. The current MMC only somewhat satisfies (or doesn’t satisfy) this as I got the impression most people have a hard time understanding the neutralization function and the underlying mathematics behind it. This is probably also one of the reasons why the main problem with the current MMC hadn’t been identified until now.

  4. (Similar to 2) MMC should always be a monotone increasing function of: originality and relative performance to the meta model. By originality I mean correlation to the meta model (original model = low correlation to the meta model). By relative performance I mean how much better or worse is the user’s model vs the meta model. MMC should of course increase with lower correlation with the meta model, but it should also decrease when the user’s predictions underperform the meta model. An original model shouldn’t be rewarded if it has very very bad relative performance. The current MMC does not satisfy this as during negative meta model rounds it punishes originality, i.e. it is not monotonically increasing in originality.

So this is not really what we want. Take a case where the metamodel is in a burn period, scoring -0.05. Model A submits random noise, and scores -0.01. Model B submits a model that is strictly better than the metamodel, but very similar to the metamodel, scoring -0.04.

What you’re saying is we should reward this model A random noise because it looks unique and it did better than the metamodel. But it gave 100% unique information, and the result was negative. It should be punished. It made the metamodel strictly worse than it was before. Model B however made the metamodel strictly better, so it should be rewarded.

Looking only at correlation_score and correlation_w_metamodel can be very misleading in this way. The current MMC approach corrects this though, and only rewards for real added unique information.

The reason we see some models with low correlation get hurt worse in burn rounds is because they are actually not helpful signals in most cases :frowning:

MMC is very difficult… you don’t play it just because you can submit a unique signal. Anyone can do that. You play it because you have something valuable that is currently either absent or underrepresented in the metamodel.

Still, your code example highlights a different part of MMC design. It seems like this is really a case of it being a 1-P model, and that information gets removed by neutralization, just as submitting the metamodel itself would. You could also form a similar example where you submit exactly a negative version of the metamodel, and boom you have something SUPER ORIGINAL (-1 corr with metamodel), and also scored +0.05 instead of -0.05! In your encoded example, that unique model was actually worse than simply submitting a negative metamodel in its entirety. But that’s not really original, it’s just market timing with a signal we already have. I can see some argument for rewarding that, but it’s not what we wanted to accomplish with MMC. Corr will already reward that timing ability proportionally to its value.

2 Likes

I am a bit confused by this. How did model B make the metamodel strictly better? The current MMC formula doesn’t implement about how the metamodel is affected by the user’s predictions. Also, why are you saying model A is random noise? How would you know that? Based on this premise, then we should punish model A even when the meta model is positive, but then why even care about originality? Also, this is an extreme example/case, realistically most models are within 0.3 to 0.95 corr with the meta model, so they are def not random noise. Yet in negative meta model rounds, the more you go towards 0.3 corr with the meta model, the lower your MMC scores becomes.

This is also confusing. The current MMC does not say anything about how the meta model behaves in relation to a user’s model. It only cares about how a user’s model behaves in relation to the meta model (very different things). Again, when the meta model is negative, the more unique you are the lower your MMC score. It doesn’t have to be 0 or negative corr with the meta model.

My code example was just to illustrate the MMC formula and it’s behavior under negative meta model rounds. There are plenty of real model examples (not 1-p etc) on this forum and the chat where simply having lower (not negative just a bit lower) corr with the meta model but higher corr on target leads to a lower MMC score during meta model negative rounds.

What you’re saying is we should reward this model A random noise because it looks unique and it did better than the metamodel. But it gave 100% unique information, and the result was negative. It should be punished. It made the metamodel strictly worse than it was before. Model B however made the metamodel strictly better, so it should be rewarded.

The only metric that you have for determining whether one model is strictly superior to another model is the correlation to the resolved scores. So if someone bets their bank on random noise, then it is because they believe that a strategy of random noise will be superior to any other strategy, especially in the context where we only report orderings and not magnitudes for our predictions, i.e., submitting random noise is the analogy of reducing the amplitude of predictions in the correlation to outcome context.

This just isn’t necessarily true. For examples where you can find users who have lower corr_w_metamodel and lower MMC, you can also find users with lower corr_w_metamodel and higher MMC. It just depends on which way that independent vector faces with respect to the target.

If a model is very unique, but it’s still burning at the same time as the metamodel, why would we reward that model? A key point of MMC for us is finding signals which burn at different times. If your model is uncorrelated but burns at the same time as the metamodel, then it means that all of your uniqueness was for nothing :frowning: . You just found a unique way to burn that no one else found. Not very helpful for us.

The problem with your analysis is you’re looking at the universe of users where
They are unique
AND there’s a burn period for the metamodel
AND the user still burned at the same time too.

The truth is that if you are unique, in general, you are more likely to be OK in a burn period. If you are burning at the same time while being unique, then it means you may have just managed to isolate a part of the metamodel responsible for the burning, and the rest of your signal didn’t save you.

Take this data for round 218 showing users who have roughly 30% corr with metamodel.

Then look at the data for round 218 showing users who have high corr with metamodel. Which group has better scores on average? Which group has higher MMC on average?

It should be clear that in this burn round, it was not necessarily better or worse to be unique. It only matters what type of uniqueness you have, whether it’s good uniqueness or bad uniqueness.

Being more unique definitely allows for higher magnitudes of MMC, in either direction. But your direction is still influenced by the quality of your signal with respect to the target.

4 Likes

This is not the problem here. The problem is that, during meta model burns, 2 models with the same corr on target, one with lower corr_w_metamodel and one with higher corr_w_metamodel, the one with lower corr_w_metamodel will have lower MMC.

There are numerous examples of models with positive corr on target and negative MMC during negative meta model rounds.

Again the user doesn’t have to burn for this to happen and they also don’t have to be unique, the problem is that MMC is an increasing function of corr with the meta model during negative meta model rounds.

The data you posted actually proves my point.

As shown by previous forum posts and by @lackofintelligence mathematical note, MMC = CORR_user - CORR_WITH_METAMODEL * CORR_META_MODEL_ON_TARGET. Looking at this it is clear that MMC will increase with CORR_user , but decrease with CORR_WITH_METAMODEL .

From the data you posted for round 218:
user sunkay: -1% corr_target, 91% corr with meta model, 1.59% MMC
user souryuu: 0.25% corr_target, 31% corr with meta model, 1.33% MMC

As you can see, we gave user sunkay a larger MMC even though user souryuu has much better, positive corr on target.

Again, it is mathematically proven that MMC = CORR_user - CORR_WITH_METAMODEL * CORR_META_MODEL_ON_TARGET. Meaning that the users with low CORR_WITH_METAMODEL from your example data got MMC due to higher CORR_user, which offset the loss from CORR_WITH_METAMODEL * CORR_META_MODEL_ON_TARGET.

In any case, I think I’ve basically exhausted all possible arguments/examples I could give.

In the end it is what it is.

I really enjoy this tournament and have really tried my absolute best to improve it by trying my hardest to identify and explain this problem and ways to fix it.

If other users want to contribute to this please feel free, but I don’t think there’s anything else I could say on the matter.

2 Likes