Well, that’s the question – “negatively” doesn’t seem right, but whatever uniqueness bonus is given, it shouldn’t get much relative to others certainly. And simple CORR+MMC accomplishes that. I’d still be interested in comparing the numbers of different ideas though, so I’m gonna see if I can pull some data since nobody is taking my hints to do it for me. (Somebody else will have to make graphs though.)
Another proposal to merge CORR & MMC: A Dynamic Payout Scheme
Motivation:
If models with MMC>0 have Mean(CORR)=0.0318 and Mean(MMC)=0.015 and you demand models with high MMC, the multiplier of MMC should be at least twice the multiplier of CORR. That is to say:
Payout = w CORR + (2-w) MMC ; such that w<0.667
Why? Because improving CORR by +2d is easier than improving MMC by +d. This is in average terms, in marginal terms it can change a little bit.
Proposal:
1.- Start with this initial scheme:
Payout = w CORR + (2-w) MMC ; such that w=0.65
2.- Adjust “w” depending on the marginal improvement of the average CORR and MMC over time.
3.- In this way the payout scheme can be changed for every tour in order to give an incentive to the submission of high MMC models.
an example is included in a “Tournament category” post.
@master_key would you consider keeping 2*MMC as an option, for people who want to only stake on MMC?
Some code to check your historical payout using only CORR or MMC or CORR+MMC:
#!/usr/bin/env python3
import numerapi
import matplotlib.pyplot as plt
import pandas as pd
import sys
import numpy as np
api = numerapi.NumerAPI()
# metrictoplot = 'corr'
# metrictoplot = 'mmc'
# metrictoplot = 'comb'
# metrictoplot = 'all'
metrictoplot = sys.argv[1]
username_list = ['integration_test', 'sugaku']
fig1 = plt.figure()
cmap = plt.cm.get_cmap('tab20b', len(username_list)*3)
i = 0
for user in username_list:
print("Collecting data for: ", user)
user_df = pd.DataFrame(api.daily_submissions_performances(user)).sort_values(by="date").groupby("roundNumber").last()
start_round=np.min(user_df.index)
end_round=np.max(user_df.index) # most recent resolved round
stake_corr = 1.0 # initial stake
stake_mmc = 1.0
stake_comb = 1.0
for r in range(start_round, end_round):
if r in user_df.index:
corr_score = user_df.loc[r, "correlation"]
mmc_score = user_df.loc[r, "mmc"]
else:
corr_score = 0.0
mmc_score = 0.0
if np.isnan(user_df.loc[r, "correlation"]) or np.isnan(user_df.loc[r, "mmc"]):
corr_score = 0.0
mmc_score = 0.0
if corr_score:
stake_corr *= 1.0 + corr_score*1.0
stake_mmc *= 1.0 + mmc_score*2.0 #2x leverage for mmc
stake_comb *= 1.0 + corr_score+mmc_score
user_df.loc[r, "weekly_stakes_corr"] = stake_corr
user_df.loc[r, "weekly_stakes_mmc"] = stake_mmc
user_df.loc[r, "weekly_stakes_comb"] = stake_comb
color = cmap(float(i)/len(username_list))
if metrictoplot == "corr":
plt.title('Expected CORR payout for models', fontsize=17)
user_df.weekly_stakes_corr.plot(label=user, color=color)
plt.text(end_round-0.75, user_df.loc[r, "weekly_stakes_corr"], user, color=color, fontweight="bold")
if metrictoplot == "mmc":
plt.title('Expected MMC payout for models', fontsize=17)
user_df.weekly_stakes_mmc.plot(label=user, color=color)
plt.text(end_round-0.75, user_df.loc[r, "weekly_stakes_mmc"], user, color=color, fontweight="bold")
if metrictoplot == "comb":
plt.title('Expected CORR+MMC payout for models', fontsize=17)
user_df.weekly_stakes_comb.plot(label=user+'_comb', color=color)
plt.text(end_round-0.75, user_df.loc[r, "weekly_stakes_comb"], user+'_COMB', color=color, fontweight="bold")
if metrictoplot == "all":
plt.title('Expected CORR and MMC payout for models', fontsize=17)
user_df.weekly_stakes_corr.plot(label=user+'_corr', color=color)
plt.text(end_round-0.75, user_df.loc[r, "weekly_stakes_corr"], user+'_CORR', color=color, fontweight="bold")
user_df.weekly_stakes_mmc.plot(label=user+'_mmc', color=color)
plt.text(end_round-0.75, user_df.loc[r, "weekly_stakes_mmc"], user+'_MMC', color=color, fontweight="bold")
user_df.weekly_stakes_comb.plot(label=user+'_comb', color=color)
plt.text(end_round-0.75, user_df.loc[r, "weekly_stakes_comb"], user+'_COMB', color=color, fontweight="bold")
i += 1
plt.grid(linestyle='--', linewidth=0.5, color="black")
plt.xlabel('Round number')
plt.ylabel('Expected payout factor')
plt.xticks(np.arange(start_round, end_round, 1), rotation=60)
ax = plt.gca()
ax.set_facecolor((0.9, 0.9, 0.9))
plt.show()
sys.exit()
Similar to the above but allowing to use a rolling window over the rounds
(update 12:01:31 25 July 2020: reduce repeated code, accept rolling window size as input)
#!/usr/bin/env python3
import sys
import numerapi
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
api = numerapi.NumerAPI()
metrics = ["correlation", "mmc", "comb"]
metrictoplot = sys.argv[1]
if metrictoplot not in metrics + ["all"]:
raise Exception("Valid metric values are %s" % (metrics + ["all"]))
username_list = ['integration_test', 'nasdaqjockey']
plt.figure(figsize=(18, 6))
cmap = plt.cm.get_cmap('cubehelix', len(username_list)*3)
rolling_window_size = int(sys.argv[2])
metric_to_style = {"correlation": "-", "mmc": "--", "comb": ":"}
for i, user in enumerate(username_list):
print("Collecting data for:", user, flush=True)
user_df = pd.DataFrame(api.daily_submissions_performances(user))\
.sort_values(by="date")\
.groupby("roundNumber")\
.last()
start_round = user_df.index.min()
end_round = user_df.index.max()
user_df["comb"] = user_df["correlation"] + user_df["mmc"]
user_df["mmc"] *= 2
rolling_series = (user_df[metrics].fillna(0) + 1) \
.rolling(rolling_window_size)\
.apply(np.prod, raw=True)
color = cmap(float(i)/len(username_list))
if metrictoplot == "all":
plt.title('Expected correlation, mmc and comb payout for models', fontsize=17)
for metric in metrics:
rolling_series[metric].plot(label=user, color=color, ls=metric_to_style[metric])
plt.text(end_round, rolling_series[metric].values[-1], "%s - %s" % (user, metric), color=color, fontweight="bold")
else:
plt.title('Expected %s payout for models' % metrictoplot, fontsize=17)
rolling_series[metrictoplot].plot(label=user, color=color, ls=metric_to_style[metrictoplot])
plt.text(end_round, rolling_series[metrictoplot].values[-1], "%s - %s" % (user, metrictoplot), color=color, fontweight="bold")
plt.grid(linestyle='--', linewidth=0.2, color="black")
plt.xlabel('Round number')
plt.ylabel('Expected payout factor')
plt.xticks(np.arange(start_round, end_round, 1), rotation=60)
ax = plt.gca()
plt.show()
It is an improvement compared to the previous situation. In particular, it is more fear with the best MMC user.
However, I think it is not enough incentive for users to focus on MMC rather than CORR. You will see it in a couple of months.
Why do you think that such incentives should exist? In my opinion, developing high MMC models are good for your CORR by itself. If your high MMC model has low CORR but unique enough, you’ll just combine your model with example predictions and will get a model with positive MMC and high CORR. If your high MMC model has high CORR - you will be already happy.
jackerparker look at this topic
Discussion on incentives: MM clones vs MM improvement