As the title indicates, when the meta-model has a negative score, MMC “punishes” originality. I’ve included a toy example code and an alternative mmc formula for negative periods.

Basically, when the meta-model has negative stock market corr, the more meta-model exposure/corr a model has, the higher the MMC, which I believe shouldn’t be the case.

I might be completely wrong/off in my calculations/formula for MMC. But if I’m not, seems like something that should be addressed.

The intuition for the current problem is something like this: user model has 0.99 corr with meta-model, user model has 1% corr on stock market, meta-model has -1% corr on stock market, mmc will be something like 2% (1% user corr - 0.99 * (-1% meta-model raw corr). If the user model had 0 corr with meta-model, then mmc will be something like 1% (1% - 0 * (-1%)). Which seems to go against the incentive plan of rewarding originality. When the meta model is positive, this doesn’t hold obviously.

The code uses the actual mmc formula/proper computations.

Basically, during negative periods, the lower the meta-model corr a model has, the lower mmc score, assuming equal raw corr on the stock market.

I’ve proposed an alternative payment scheme for negative periods in the code.

Basically, when meta_model is negative, neutralize with proportion = - 1 (note the minus!), then do some “manual” cleaning to the mmc score, so that if the user model has 0.99 corr with meta-model but much higher raw corr on the stock market, it still gets a good mmc score, but lower than a model with 0 corr with meta-model. Obviously this needs to be further checked by the numerai team so that there are no cases where it falls apart.

```
import numpy as np
import pandas
import scipy.stats
def spearmanr(target, pred):
return np.corrcoef(
target,
pred.rank(pct=True, method="first")
)[0, 1]
def neutralize_series(series, by, proportion=1.0):
scores = series.values.reshape(-1, 1)
exposures = by.values.reshape(-1, 1)
# this line makes series neutral to a constant column so that it's centered and for sure gets corr 0 with exposures
exposures = np.hstack(
(exposures, np.array([np.mean(series)] * len(exposures)).reshape(-1, 1)))
correction = proportion * (exposures.dot(
np.linalg.lstsq(exposures, scores)[0]))
corrected_scores = scores - correction
neutralized = pandas.Series(corrected_scores.ravel(), index=series.index)
return neutralized
def _normalize_unif(df):
X = (df.rank(method="first") - 0.5) / len(df)
return scipy.stats.uniform.ppf(X)
target = pandas.Series([0, 0, 0, 0,
0.25, 0.25, 0.25, 0.25,
0.5, 0.5, 0.5, 0.5,
0.75, 0.75, 0.75, 0.75,
1, 1, 1, 1])
meta_model = pandas.Series([1, 0, 0.25,
1, 0.5, 0.75,
0.5, 0.5, 1,
0.75, 0.75, 0,
0, 0.25, 0.25,
1, 0, 0.25, 0.5, 0.75])
high_meta_corr_model = pandas.Series([1, 0, 0.25,
1, 0.25, 0.75,
0.5, 0.5, 1,
0.75, 0.75, 0,
0, 0.25, 0.25,
1, 0, 0.5, 0.5, 0.75])
low_meta_corr_model = pandas.Series([1, 1, 0.75,
0, 0.5, 0.25,
0.5, 0.25, 0,
0.5, 0.25, 1,
1, 0.75, 0.75,
0, 0, 0.75, 0.5, 0.25])
# Meta model has raw corr with target of -5.5% (burning period)
meta_model_raw_perf = spearmanr(target, meta_model)
print(f"Meta_model performance: {meta_model_raw_perf}")
# Model highly correlated with meta model (i.e. non-original model)
# 95% meta-model correlation
# 2.45% raw corr with target
# Overall a good model, but not very original
high_meta_corr_raw_perf = spearmanr(target, high_meta_corr_model)
high_meta_corr = spearmanr(meta_model, high_meta_corr_model)
print(f"High_meta_corr model cross-corr: {spearmanr(meta_model, high_meta_corr_model)}")
print(f"High_meta_corr model performance: {high_meta_corr_raw_perf}")
# Model uncorrelated with meta model (i.e. original model)
# -63% meta-model correlation
# 2.45% raw corr with target
# Overall a good model and also very original
low_meta_corr_raw_perf = spearmanr(target, low_meta_corr_model)
low_meta_corr = spearmanr(meta_model, low_meta_corr_model)
print(f"Low_meta_corr model cross-corr: {spearmanr(meta_model, low_meta_corr_model)}")
print(f"Low_meta_corr model performance: {low_meta_corr_raw_perf}")
# MMC Computation
# All series are already uniform
# Neutralize (using forum post code for neutralization)
neutralized_high_corr = neutralize_series(high_meta_corr_model, meta_model, proportion=1.0)
neutralized_low_corr = neutralize_series(low_meta_corr_model, meta_model, proportion=1.0)
# Compute MMC
mmc_high_corr = np.cov(target,
neutralized_high_corr)[0,1]/(0.29**2)
mmc_low_corr = np.cov(target,
neutralized_low_corr)[0,1]/(0.29**2)
print(f"MMC for non-original model: {mmc_high_corr}")
print(f"MMC for original model: {mmc_low_corr}")
# I assume there is some clipping in order to not actually give negative mmc to the original model in reality
# Still 0.108 MMC for meta-model copy (non-original model) vs -0.43 MMC for completely different model seems bad
# CORR+MMC performance is 0.0245 + 0.108 for non-original model vs 0.0245 - 0.43 for original model
# Most likely this will hold in any burning period, and the more original a model is, the more punishing MMC will be in a burning period!
# Counter proposals:
# Keep same MMC in good periods
# In burn periods (i.e. meta-model is negative), make proportion = -1.0 in neutralize_series, and then use manual clipping techniques
# E.g. proposal
# Neutralize (using forum post code for neutralization) with -1
neutralized_high_corr = neutralize_series(high_meta_corr_model, meta_model, proportion=-1)
neutralized_low_corr = neutralize_series(low_meta_corr_model, meta_model, proportion=-1)
# Compute MMC
mmc_high_corr = np.cov(target,
_normalize_unif(neutralized_high_corr))[0,1]/(0.29**2)
mmc_low_corr = np.cov(target,
_normalize_unif(neutralized_low_corr))[0,1]/(0.29**2)
print(f"New MMC before clipping for non-original model: {mmc_high_corr}")
print(f"New MMC before clipping for original model: {mmc_low_corr}")
# Now mmc_high_corr is negative
# Clipping
new_mmc_high_corr = max([(1 - high_meta_corr) * (high_meta_corr_raw_perf - meta_model_raw_perf), mmc_high_corr, high_meta_corr_raw_perf])
new_mmc_low_corr = max([(1 - low_meta_corr) * (low_meta_corr_raw_perf - meta_model_raw_perf), mmc_low_corr, low_meta_corr_raw_perf])
print(f"New MMC for high correlation is still positive but quite low as the model is a copy of the meta model: {new_mmc_high_corr}")
print(f"New MMC for low correlation is very high as the model is almost opposite to the meta model: {new_mmc_low_corr}")
```