Aren’t you glossing over the part where your predictions are neutralized with the metamodel and so then the residuals are compared to the targets? You make it sound as if only your single-number correlation score matters to this process and not the actual interactions of your specific predictions and the metamodel and targets. (i.e. not everybody with the same overall correlation to the targets and also the same correlation to the metamodel will end up with the same MMC, right? The actual predictions do matter.)
Anyway, here’s an interesting case. For round 218 I submitted a couple of unstaked (1-p) models to my normal models to see if the MMC results would be exactly symmetrical. (They are.) But the effect was interesting with one of them. (218 was probably a metamodel burn, right?)
The result for “wigglemuse” (one of my normal models) was a burn:
So I’ve got a slight positive MMC for a model that is relatively correlated with metamodel (but nothing close to a clone) but did slightly better than the metamodel (judging by my rank and scores by integration_test,etc) but still negative to targets. So that’s fine.
And so the (1-p) model (“smoghovian”) ended up with the exact opposites:
So the opposite – at least if you looked at it in isolation and weren’t thinking about it flipped – might leave you scratching your head. Here I was highly negatively correlated with the metamodel, did WAY better on correlation to the targets, and yet have a slight negative MMC. But of course it is completely symmetrical with the other opposite model ( as expected) although some disagree that any positive corr should get a negative MMC.
I think the disconnect comes as many want to be additionally rewarded/punished for essentially the same thing as CORR is measuring if they do it better or worse than others. (In which case we’d just make MMC = percentile corr rank) But MMC is actually on a totally different axis (or can be) than your CORR scores, and is its own number. Which doesn’t necessarily make it a good or useful number (it doesn’t necessarily not either), but we’ve seen lots of attempts to judge it by its correlation to other things rather than just what it is actually trying to measure.