More metrics for ya

Where does this magic number come from?
Is this the correlation from some basic estimator?

richard said that this number approximated their average trading costs

1 Like

I just realized that the max_feature_exposure implementation in my previous post is incorrect (it’s computing the max of each era’s feature correlation, instead of the max of each feature’s correlation with the predictions). Here’s the same code block with the correct implementation.

import csv
import numpy as np
import pandas as pd

TOURNAMENT_NAME = "kazutsugi"
PREDICTION_NAME = f"prediction_{TOURNAMENT_NAME}"


def feature_exposures(df):
    df = df[df.data_type == 'validation']
    feature_columns = [x for x in df.columns if x.startswith('feature_')]
    pred = df[PREDICTION_NAME]
    correlations = []
    for col in feature_columns:
        correlations.append(np.corrcoef(pred, df[col])[0, 1])
    return np.array(correlations)


def feature_exposure(df):
    return np.std(feature_exposures(df))


def max_feature_exposure(df):
    return np.max(feature_exposures(df))


def read_csv(file_path):
    with open(file_path, 'r') as f:
        column_names = next(csv.reader(f))

    dtypes = {x: np.float16 for x in column_names if
              x.startswith(('feature', 'target'))}
    df = pd.read_csv(file_path, dtype=dtypes, index_col=0)

    return df


if __name__ == '__main__':
    tournament_data = read_csv(
        "numerai_tournament_data.csv")
    example_predictions = read_csv(
        "example_predictions_target_kazutsugi.csv")
    merged = pd.merge(tournament_data, example_predictions,
                      left_index=True, right_index=True)
    fe = feature_exposure(merged)
    max_fe = max_feature_exposure(merged)
    print(f"Feature exposure: {fe:.4f} "
          f"Max feature exposure: {max_fe:.4f}")

Update 3rd September, 2020: The feature exposure metrics have changed slightly since I posted this. I’m leaving the code in this post intact, as I’d posted it earlier. Please refer to this post to find out more about the new feature exposure and max feature exposure metrics.

1 Like

I am new here and I am trying to understand the whole Numerai project yet, but for me it looks like a possible Risk Free interest rate. At least it makes sense to me. The idea would be to avoid considering profits that can be obtained without risk in the market. For instance, this helps to make your own sharpe ratios comparable over time when the risk free rate change.