MMC Payouts Adjustment Proposal

Right now, we are in a historically great time for standard models, such as integration_test.
Take a look at this chart of integration_test cumulative scores since Kazutsugi started. image

The goal of MMC is to encourage users to find unique models that perform well. Right now though, even the models that we internally really like are preferring correlation payouts right now, even though they were the models that we specifically set out to reward more with MMC.

Now, this trend is unlikely to continue, but even so, the existence of this type of scenario to me highlights a shortcoming of MMC: that is, making users play a meta game of having to choose between two different tournament. Users should just simply be rewarded for having a more unique and better model, period.

The example I like to use is Nasdaqjockey. This model has very high correlation and very low correlation with the metamodel, but is still being payed less than integration_test lately, since both are staking on corr.

So my proposal is to not make users choose between corr and MMC. Instead, users could simply opt in to being exposed to MMC at the same time as correlation. Here’s what it looks like for Nasdaqjockey

And when you show the same for integration_test:

Notice the difference in Y axis. Nasdaqjockey, in this format, makes almost 2x as much as integration_test, despite overall having very similar correlation scores. This is exactly what we want to reward with MMC.

Note that in in this scenario MMC would not have a 2x payout multiplier as it does now, since the purpose of that was to bring the two separate tournament more in line in terms of risk. If they are combined, then we no longer need this adjustment since you aren’t choosing one over the other.

Will leave this up for some time before making anything official, so others can discuss and give feedback about if this change would make them more interested in finding high MMC models than the current structure.

12 Likes

Makes perfect sense to me. I hope it’s rolled out.

2 Likes

It looks like serious progress is being made to fix MMC from a motivation perspective. I would like to reiterate my suggestion to fix MMC from a data scientist perspective by providing the metamodel because what myself and others have found in practice is that having the example predictions does not provide enough information to train a model that is oblique to the metamodel. I have suggested publishing weights that Numer.ai publishes every week that only indicate which rows of the live data are relevant.

I think the main objection to the weights is that it seems to allow one to get extra information about the meta model itself and therefore this method can be gamed. Now I am sure that there is a way around this detail. Didn’t Numer.ai guys engineer a crypto currency? So they must know something about cryptography. What I am saying is push that cryptographic technique further to give us a one-way cryptographic loss function for MMC. It takes predictions as input and using encrypted truth spits out the MMC Spearman correlation function. Now why can’t you do that?

The encrypted truth that I am talking about is only the best estimate of what the metamodel would have done and I think you have more than enough data to estimate it.

2 Likes

I agree! But not as straight forward to productionize as my suggestion :laughing:

1 Like

OK, just giving it my best shot. Maybe when you get some more funding.

I am highly confident that the past few rounds are highly atypical for what we should expect going forward. My main NO_FORMAL_TRAINING which been getting optimized for MMC since about round 200 is performing better with respect to CORR under this current regime than it performs on ~half of the eras it was TRAINED on. I would not extrapolate how MMC and CORR behave under this paradigm to how they will behave normally. I think we are falling for the streetlight effect.

Another thing to keep in mind is that low correlation to metamodel predictions does not necessarily imply low correlation to metamodel era performance, ie a model that has very low correlation to the metamodel may still burn on the same eras the metamodel burns. See my posts under Learning Two Uncorrelated Models.

Unless there is some information you have access to that the users don’t, I think we should wait for more live information before making major changes like this.

4 Likes

This suggestion, like the ones in the other thread from of_s and bor do not require inside information to calculate the MMC figure itself (in which case they could only test it internally – note that in of_s’s suggestion he is using the meta_corr number AS the mmc, but we have that number too), so I suggest we do the calculations and graphs and put all these suggestions side-by-side for a historical backtest (like Mike has done here for integration_test and nasdaqjockey) but with many different users and see if there are scenarios where someone looks over-penalized or over-rewarded.

2 Likes

That brings us back to where it all began, homomorphic encryption. :stuck_out_tongue:
If you’d like to give it a shot, TFHE is pretty much the fastest HE implementation out there.

Coming to think of it, I like your idea because of the inherent hardness of it. It’d probably take in the order of a couple of hours to days to compute spearman’s rank correlation coefficient between two sequences with ~5000 data points (roughly, the size of each week’s live set) in a fully homomorphic system.

Edit: Message threading does not seem to work here. For context, this was in reply to @lackofintelligence’s “crypto” idea.

Obviously this looks great for my models. I like to pick models that are off the beaten path, and up until now, it has treated me quite well. I switched to the MMC tournament for 3 submissions and I’m significantly under-performing CORR. I an VERY interested in this new approach. Can you explain what you mean by “exposed to MMC at the same time as correlation”? Does it mean take the max(CORR, 2*MMC)? Probably not, so what are the cons? I see the pros!

3 Likes

It means simply CORR + MMC. (both 1x)

2 Likes

@master_key, would you be increasing the 1*MMC multiplier as MMC gets more difficult to find?

@jrb , We probably want the encrypted one way MMC loss function for the training and validation sets, not for unlabeled eras, and we want to be able to make 100 passes across it per week. @master_key , having this function for the last week only is needed. That is, publish last week’s encrypted meta model loss function only , not even an estimate of the future model. That would be a very very solid way of creating unique models for the future and its a lot easier.

@lackofintelligence What is an “encrypted one way MMC loss function”? That’s not how a HE system works. What you’re asking for could be trivially exploited, if implemented.

@jrb , I think you understand. The idea is that there is some difficulty in obtaining the loss. Maybe a minute to obtain the loss for the entire dataset. Even if somebody expends a week of computer power to obtain an estimate of last week’s metamodel predictions for all of the rows, it is just last week’s meta model predictions on the training set, what good is it? In fact if you think about it why wouldn’t Numer.ai release it totally free? It does not give you predictions on any live data. In fact, I’ll just say that right now. Why not just release last week’s metamodel predictions on the training and validation data sets? That would be the absolute best place to start looking for a unique model.

@wigglemuse is correct, it’s literally MMC + CORR = Payouts. So if your MMC is -0.1 and your Corr is 0.3, your payout would be 0.2

Good question. I’m not sure how much harder it will be to get MMC over time! We’ll stay committed to making sure we reward models that help us though. Just don’t know if it will be by a multiplier for MMC or what

3 Likes

Seems like a quite usable system. Can’t speak for anyone else but it’d certainly push me more towards MMC.

2 Likes

For Numerai’s long term succes, among other things, it is important that the users with the best and most unique models should have the highest returns.

Just CORR only covers rewarding the ‘best’ models and is therefore unfit. I would even go as far as saying that it is up for replacement as soon as something better comes along.

To increase model uniqueness MMC was introduced earlier this year. But MMC is not just model uniqueness, it also covers relative model performance. Unfortunately, this proved to be it’s biggest flaw. Let me give an (unfortunately fictional) example: I have made the perfect model (for Numerai), an insanely consistent model with 0.03 CORR and a deemingly impossible high sharpe. With such a consistent model I would for sure like to stake on CORR, a stable ~3% return each week is both safe (low risk) and more than profitable enough. Given the uniqueness of this model you would think MMC would be even more profitable. And it might be in the long term. But in the recent period this would result in losses far exceeding your CORR risks & drawdowns. Why? Because all the boosting models achieve >0.06 CORR in these tournament rounds leaving you with negative MMC. And such a risk is not worth it to switch to MMC from your stable CORR returns.
Relative performance is not a good metric. Integration_Test outperforms NasdaqJockey on more weeks than vice versa, but it is those weeks where Integration_Test does bad where NasdaqJockey shines and the sole reason why it is rightfully praised as a model.

Now MMC+CORR is proposed. I like it much more than just increasing the MMC payouts (i.e.: MMC payouts get 3*MMC values). It covers both unique and good models and could be seen as best of both CORR and MMC. In theory, one could therefore suggest to replace both current tournaments with this new CORR+MMC tournament.

Although parts might be unclear or misunderstood, this message is not meant as critique. I applaud MikeP in his search for the ‘best’ (in all ways) payout metric and whilst it might not last for much longer, MMC was a good step in this journey and we can learn much from it. I believe that CORR+MMC could be a useful next step. But I have the feeling we can come up with something better. CORR+MMC still has the ‘relative performance’ drawback of MMC that I do not like.

I have some ideas, but nothing worth sharing yet. But these thoughts might help others:

  • Can we change MMC to cover only uniqueness? Just as CORR rewards raw performance?
  • Is uniqueness as simple as correlations with other models, or more complicated? If trading is done only on the largest sell/buy signals and/or Numerai first neutralizes our predictions, shouldn’t we include this somewhere/somehow?
  • If NasdaqJockey is one of the ‘best’ models for Numerai - why is it not on top of either (or the combined) leaderboard? Does MMC not correlate that well with their in-house metrics of model usefulness?
  • To reward unique and consistent good performing models like NasdaqJockey, we might need to move away from just correlation and move towards sharpe-based metrics? Sharpe between rounds? Sharpe within rounds (over the individual days of a round)?
  • EDIT: If we would have a metric for uniqueness, integration_test should be one of the worst models on this metric. OLD: Why does integration_test not have a negative MMC consistently? It is the least unique model there is.
3 Likes

It makes sense for integration_test to have positive MMC as it still is one of the best models outright. Any model can (theoretically) get copied, so uniqueness changes with time and trends. (The only reason integration_test is not unique is nothing intrinsic to it – it is because everybody copies it or just submits the exact duplicate predictions.) Negative MMC for a round is basically saying the metamodel would have been better off without this model in it (for this round). So lack of uniqueness itself can’t be the thing that gets you negative MMC – it always has to be tied up with performance somehow.

2 Likes

Thanks for your response Wigglemuse. I understand what you mean and fully agree. My last sentence was meant slightly different, more like: “If we would have a metric for uniqueness, integration_test should be one of the worst models on this metric.”. Edited this in my original post.

If there would be a payout scheme purely based on performance and uniqueness, integration_test should be positively rewarded for it’s performance, and negatively because it is probably one of the least unique models around.

1 Like