Submission core metrics

I wanted to clarify some doubts about the basic metrics shown after a submission is done:

Validation correlation: The mean of your per-era correlations.
Is this computed using the predictions of the validation data set and the real targets of the validation dataset?
The code should look like (got it from the github examples):

def score(df):
    pct_ranks = df[PREDICTION_NAME].rank(pct=True, method="first")
    targets = df[TARGET_NAME]
    return np.corrcoef(targets, pct_ranks)[0, 1]

validation_data = tournament_data[tournament_data.data_type == "validation"]
validation_correlations = validation_data.groupby("era").apply(score)
validation_correlations_web = validation_correlations.mean()

Validation Sharpe: This is the mean of your per-era correlations divided by the standard-deviation of your per era correlations.
Based on what I mentioned above regarding Validation Correlation, Validation Sharpe should look like:

validation_sharpe_web = validation_correlations.mean() / validation_correlations.std()

Is that right?

Corr With Example Preds: This is the correlation between your model and the example predictions.
Which are the example predictions and agains which dataset are calculated?

Thanks in advance.

1 Like

Your understanding of val corr and val sharpe is correct. Regarding the example preds, the example predictions is already included in the downloaded numerai data as a csv file so you can compare it with your preds using the corr metric. These example preds are generated using the file also included in the downloaded data.

1 Like

What is considered a good correlation between my model’s predictions and the example predictions? Is it a high or a low correlation more appropriate?

There is no real good answer to that question. If your correlation is close to the example prediction, then you will most probably not have good MMC. If the correlation is very low between these two, you might have high MMC, but very bad CORR on the live data. You could have very low correlation to the example predictions (be completely orthogonal to it) get good MMC and good CORR. There really isn’t any way to know for sure. Some models have done really well having high correlation with the example predictions, some have done really well that have very low correlation.