More metrics for ya

koerrie · August 6, 2020, 4:54pm

Where does this magic number come from?
Is this the correlation from some basic estimator?

arbitrage · August 7, 2020, 4:18pm

richard said that this number approximated their average trading costs

jrb · August 28, 2020, 4:49pm

I just realized that the max_feature_exposure implementation in my previous post is incorrect (it’s computing the max of each era’s feature correlation, instead of the max of each feature’s correlation with the predictions). Here’s the same code block with the correct implementation.

import csv
import numpy as np
import pandas as pd

TOURNAMENT_NAME = "kazutsugi"
PREDICTION_NAME = f"prediction_{TOURNAMENT_NAME}"


def feature_exposures(df):
    df = df[df.data_type == 'validation']
    feature_columns = [x for x in df.columns if x.startswith('feature_')]
    pred = df[PREDICTION_NAME]
    correlations = []
    for col in feature_columns:
        correlations.append(np.corrcoef(pred, df[col])[0, 1])
    return np.array(correlations)


def feature_exposure(df):
    return np.std(feature_exposures(df))


def max_feature_exposure(df):
    return np.max(feature_exposures(df))


def read_csv(file_path):
    with open(file_path, 'r') as f:
        column_names = next(csv.reader(f))

    dtypes = {x: np.float16 for x in column_names if
              x.startswith(('feature', 'target'))}
    df = pd.read_csv(file_path, dtype=dtypes, index_col=0)

    return df


if __name__ == '__main__':
    tournament_data = read_csv(
        "numerai_tournament_data.csv")
    example_predictions = read_csv(
        "example_predictions_target_kazutsugi.csv")
    merged = pd.merge(tournament_data, example_predictions,
                      left_index=True, right_index=True)
    fe = feature_exposure(merged)
    max_fe = max_feature_exposure(merged)
    print(f"Feature exposure: {fe:.4f} "
          f"Max feature exposure: {max_fe:.4f}")

Update 3rd September, 2020: The feature exposure metrics have changed slightly since I posted this. I’m leaving the code in this post intact, as I’d posted it earlier. Please refer to this post to find out more about the new feature exposure and max feature exposure metrics.

kecol · April 4, 2021, 7:31pm

I am new here and I am trying to understand the whole Numerai project yet, but for me it looks like a possible Risk Free interest rate. At least it makes sense to me. The idea would be to avoid considering profits that can be obtained without risk in the market. For instance, this helps to make your own sharpe ratios comparable over time when the risk free rate change.

Topic		Replies	Views
Era Boosted Models Data Science	21	15196	October 10, 2021
Submission core metrics Tournament	3	1771	October 2, 2020
Era distribution Data Science	1	826	July 30, 2021
Visualizing the New Data Data Science	3	1004	September 10, 2021
Era-wise Time-series Cross Validation Data Science	24	11384	November 5, 2021

More metrics for ya

Related topics