Hi,
I wanted to clarify some doubts about the basic metrics shown after a submission is done:
Validation correlation: The mean of your per-era correlations.
Is this computed using the predictions of the validation data set and the real targets of the validation dataset?
The code should look like (got it from the github examples):
def score(df):
pct_ranks = df[PREDICTION_NAME].rank(pct=True, method="first")
targets = df[TARGET_NAME]
return np.corrcoef(targets, pct_ranks)[0, 1]
validation_data = tournament_data[tournament_data.data_type == "validation"]
validation_correlations = validation_data.groupby("era").apply(score)
validation_correlations_web = validation_correlations.mean()
Validation Sharpe: This is the mean of your per-era correlations divided by the standard-deviation of your per era correlations.
Based on what I mentioned above regarding Validation Correlation, Validation Sharpe should look like:
validation_sharpe_web = validation_correlations.mean() / validation_correlations.std()
Is that right?
Corr With Example Preds: This is the correlation between your model and the example predictions.
Which are the example predictions and agains which dataset are calculated?
Thanks in advance.