As the title indicates, when the meta-model has a negative score, MMC “punishes” originality. I’ve included a toy example code and an alternative mmc formula for negative periods.
Basically, when the meta-model has negative stock market corr, the more meta-model exposure/corr a model has, the higher the MMC, which I believe shouldn’t be the case.
I might be completely wrong/off in my calculations/formula for MMC. But if I’m not, seems like something that should be addressed.
The intuition for the current problem is something like this: user model has 0.99 corr with meta-model, user model has 1% corr on stock market, meta-model has -1% corr on stock market, mmc will be something like 2% (1% user corr - 0.99 * (-1% meta-model raw corr). If the user model had 0 corr with meta-model, then mmc will be something like 1% (1% - 0 * (-1%)). Which seems to go against the incentive plan of rewarding originality. When the meta model is positive, this doesn’t hold obviously.
The code uses the actual mmc formula/proper computations.
Basically, during negative periods, the lower the meta-model corr a model has, the lower mmc score, assuming equal raw corr on the stock market.
I’ve proposed an alternative payment scheme for negative periods in the code.
Basically, when meta_model is negative, neutralize with proportion = - 1 (note the minus!), then do some “manual” cleaning to the mmc score, so that if the user model has 0.99 corr with meta-model but much higher raw corr on the stock market, it still gets a good mmc score, but lower than a model with 0 corr with meta-model. Obviously this needs to be further checked by the numerai team so that there are no cases where it falls apart.
import numpy as np
import pandas
import scipy.stats
def spearmanr(target, pred):
return np.corrcoef(
target,
pred.rank(pct=True, method="first")
)[0, 1]
def neutralize_series(series, by, proportion=1.0):
scores = series.values.reshape(-1, 1)
exposures = by.values.reshape(-1, 1)
# this line makes series neutral to a constant column so that it's centered and for sure gets corr 0 with exposures
exposures = np.hstack(
(exposures, np.array([np.mean(series)] * len(exposures)).reshape(-1, 1)))
correction = proportion * (exposures.dot(
np.linalg.lstsq(exposures, scores)[0]))
corrected_scores = scores - correction
neutralized = pandas.Series(corrected_scores.ravel(), index=series.index)
return neutralized
def _normalize_unif(df):
X = (df.rank(method="first") - 0.5) / len(df)
return scipy.stats.uniform.ppf(X)
target = pandas.Series([0, 0, 0, 0,
0.25, 0.25, 0.25, 0.25,
0.5, 0.5, 0.5, 0.5,
0.75, 0.75, 0.75, 0.75,
1, 1, 1, 1])
meta_model = pandas.Series([1, 0, 0.25,
1, 0.5, 0.75,
0.5, 0.5, 1,
0.75, 0.75, 0,
0, 0.25, 0.25,
1, 0, 0.25, 0.5, 0.75])
high_meta_corr_model = pandas.Series([1, 0, 0.25,
1, 0.25, 0.75,
0.5, 0.5, 1,
0.75, 0.75, 0,
0, 0.25, 0.25,
1, 0, 0.5, 0.5, 0.75])
low_meta_corr_model = pandas.Series([1, 1, 0.75,
0, 0.5, 0.25,
0.5, 0.25, 0,
0.5, 0.25, 1,
1, 0.75, 0.75,
0, 0, 0.75, 0.5, 0.25])
# Meta model has raw corr with target of -5.5% (burning period)
meta_model_raw_perf = spearmanr(target, meta_model)
print(f"Meta_model performance: {meta_model_raw_perf}")
# Model highly correlated with meta model (i.e. non-original model)
# 95% meta-model correlation
# 2.45% raw corr with target
# Overall a good model, but not very original
high_meta_corr_raw_perf = spearmanr(target, high_meta_corr_model)
high_meta_corr = spearmanr(meta_model, high_meta_corr_model)
print(f"High_meta_corr model cross-corr: {spearmanr(meta_model, high_meta_corr_model)}")
print(f"High_meta_corr model performance: {high_meta_corr_raw_perf}")
# Model uncorrelated with meta model (i.e. original model)
# -63% meta-model correlation
# 2.45% raw corr with target
# Overall a good model and also very original
low_meta_corr_raw_perf = spearmanr(target, low_meta_corr_model)
low_meta_corr = spearmanr(meta_model, low_meta_corr_model)
print(f"Low_meta_corr model cross-corr: {spearmanr(meta_model, low_meta_corr_model)}")
print(f"Low_meta_corr model performance: {low_meta_corr_raw_perf}")
# MMC Computation
# All series are already uniform
# Neutralize (using forum post code for neutralization)
neutralized_high_corr = neutralize_series(high_meta_corr_model, meta_model, proportion=1.0)
neutralized_low_corr = neutralize_series(low_meta_corr_model, meta_model, proportion=1.0)
# Compute MMC
mmc_high_corr = np.cov(target,
neutralized_high_corr)[0,1]/(0.29**2)
mmc_low_corr = np.cov(target,
neutralized_low_corr)[0,1]/(0.29**2)
print(f"MMC for non-original model: {mmc_high_corr}")
print(f"MMC for original model: {mmc_low_corr}")
# I assume there is some clipping in order to not actually give negative mmc to the original model in reality
# Still 0.108 MMC for meta-model copy (non-original model) vs -0.43 MMC for completely different model seems bad
# CORR+MMC performance is 0.0245 + 0.108 for non-original model vs 0.0245 - 0.43 for original model
# Most likely this will hold in any burning period, and the more original a model is, the more punishing MMC will be in a burning period!
# Counter proposals:
# Keep same MMC in good periods
# In burn periods (i.e. meta-model is negative), make proportion = -1.0 in neutralize_series, and then use manual clipping techniques
# E.g. proposal
# Neutralize (using forum post code for neutralization) with -1
neutralized_high_corr = neutralize_series(high_meta_corr_model, meta_model, proportion=-1)
neutralized_low_corr = neutralize_series(low_meta_corr_model, meta_model, proportion=-1)
# Compute MMC
mmc_high_corr = np.cov(target,
_normalize_unif(neutralized_high_corr))[0,1]/(0.29**2)
mmc_low_corr = np.cov(target,
_normalize_unif(neutralized_low_corr))[0,1]/(0.29**2)
print(f"New MMC before clipping for non-original model: {mmc_high_corr}")
print(f"New MMC before clipping for original model: {mmc_low_corr}")
# Now mmc_high_corr is negative
# Clipping
new_mmc_high_corr = max([(1 - high_meta_corr) * (high_meta_corr_raw_perf - meta_model_raw_perf), mmc_high_corr, high_meta_corr_raw_perf])
new_mmc_low_corr = max([(1 - low_meta_corr) * (low_meta_corr_raw_perf - meta_model_raw_perf), mmc_low_corr, low_meta_corr_raw_perf])
print(f"New MMC for high correlation is still positive but quite low as the model is a copy of the meta model: {new_mmc_high_corr}")
print(f"New MMC for low correlation is very high as the model is almost opposite to the meta model: {new_mmc_low_corr}")