Correlation between Meta Model Predictions and Targets

unsentient · March 4, 2023, 12:24am

I took a look at the correlation between historical meta-model predictions and the actual targets. The results should interest anyone looking to have more control of their model’s correlation with the meta-model.

For the 150 eras where we have labeled meta-model data and no NaNs in the targets (eras 888 to 1038), I found the pearson correlation between each target and the meta-model prediction. I did this for the entire period and era-wise. I then sorted the results by overall correlation and plotted them in the heatmap below. I also included era-wise standard deviation in correlation. I checked the overall p-values and they were (unsurprisingly) much much lower than 0.05.

If you’re like me then the outliers will pique your interest.

Targets Aurthur, Alan, and Janet - 20d and 60d - all fall on the low side of correlation. They also fall on the low side of standard deviation. I.e. they are consistently un-correlated (less-correlated) with the meta-model. For anyone looking to reduce their correlation with the meta-model, they may want to start training on Aurthur, Alan, or Janet.

On the other end of the spectrum, targets seem to fall into two different camps. Targets: thomas_60, ben_60 and george_60 are all high-corr/high-std, while targets: william_60 nomi_60, waldo_60, jerome_60, ralph_60, are tyler_60, all more in the high-corr/low-std camp. If you’re strategy is to stay consistently in the pack and close to the meta-model maybe you want to be training on the latter group, and if you want more of a wild ride then try Thomas Ben or George.

… all very interesting… For me, it raises a lot more questions and paths to investigate.

code to make heatmap, just plug into a colab!

!pip install --upgrade numerapi
import numerapi
napi = numerapi.NumerAPI()
napi.download_dataset("v4.1/validation.parquet", "validation.parquet")
napi.download_dataset("v4.1/meta_model.parquet", "meta_model.parquet")
import pyarrow.parquet as pq
import pandas as pd
md = pq.read_metadata('validation.parquet')
tgt_cols = [i for i in md.schema.names if i.startswith('target')]
val_df = pq.read_table('validation.parquet',columns=(['id','era']+tgt_cols),).to_pandas().astype({'era': 'int32'})
val_df = val_df.drop(columns=['target'])
val_df = val_df[(val_df.era>=888)&(val_df.era<=1038)] #888 to 1038 represents the eras with labled meta data and targets
mm_df = pq.read_table('meta_model.parquet').to_pandas().astype({'era': 'int32'})
mm_df = mm_df[(mm_df.era>=888)&(mm_df.era<=1038)]
mm_df.drop(columns=['data_type','era'])
df = pd.concat([val_df,mm_df.numerai_meta_model], axis=1)
del val_df, mm_df
new_cols={}
for col in df.columns:
    new_cols.update({col:col.replace('target_','')})
df = df.rename(new_cols, axis='columns')
for col in df.columns:
    new_cols.update({col:col.replace('v4_','')})
df = df.rename(new_cols, axis='columns')
df = df.rename({'numerai_meta_model':'meta_model'}, axis='columns')
overall_corr = df.corrwith(df.meta_model).to_frame().transpose().drop(columns=['era','meta_model'])
erawise_corr = df.groupby('era').corrwith(df.meta_model).drop(columns=['meta_model'])
overall_corr = overall_corr.sort_values(by=0,axis=1).rename(index={0: 'Row_1'})
erawise_corr = erawise_corr[overall_corr.columns.tolist()]
ew_std = erawise_corr.std().to_frame().transpose()
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
fig, axes = plt.subplots(3, 1, figsize=(20,10), gridspec_kw={'height_ratios':[1,1,20]}, sharex=True)
axes[0].set_title('Overall Target/Meta Model Correlation (with values)')
axes[1].set_title('Standard Devaition of Overall Correlation (with rank)')
axes[2].set_title('Erawise Target/Meta Model Correlation')
sns.heatmap(overall_corr, ax=axes[0], cmap='rocket_r', xticklabels=False, annot=overall_corr)
sns.heatmap(ew_std,       ax=axes[1], cmap='rocket_r', xticklabels=False, annot=ew_std.rank(axis='columns',).astype('int'))
sns.heatmap(erawise_corr, ax=axes[2], cmap='rocket_r')
pass

Comments and criticisms are welcome!

andralienware · March 4, 2023, 2:47am

Thanks, this will be good for ensemble development.

Topic		Replies	Views
CORR with Meta Model Tournament	5	1094	July 27, 2022
Still understanding scores: CORR of the MM Tournament	5	510	March 18, 2024
Another way to optimize for TC Data Science	22	2814	January 5, 2023
Some info on multiple targets? Tournament	10	1566	September 20, 2022
Question about how to understand Meta Model in Each Round Tournament	7	742	May 7, 2021

Correlation between Meta Model Predictions and Targets

Related topics