Some info on multiple targets?

taori · July 28, 2022, 6:30am

I didn’t find much information (nothing at all) regarding the multiple targets in the official documentation. I would like to know in which way they differ (except the 20 vs 60 days difference, which is clear). Does anybody know something on this topic? Thanks.

wigglemuse · July 28, 2022, 1:21pm

Nobody knows! They are neutralized to different things and some such is about all the explanation we’ve gotten. Testing shows that at least several of them are pretty useful though.

autratec · July 29, 2022, 3:27am

you might come out your own assumption and test whether it aligned with the target they provided. let’s say market moving up more than 1% with 0.75 and move down 1% with 0.25.

taori · August 25, 2022, 9:37pm

This post made me curious and so here is the correlation matrix between targets and the dendrogram to better highlight relationships.

Here is the code.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt


targets = [
    'target_nomi_v4_20',
    'target_nomi_v4_60',
    'target_jerome_v4_20',
    'target_jerome_v4_60',
    'target_janet_v4_20',
    'target_janet_v4_60',
    'target_ben_v4_20',
    'target_ben_v4_60',
    'target_alan_v4_20',
    'target_alan_v4_60',
    'target_paul_v4_20',
    'target_paul_v4_60',
    'target_george_v4_20',
    'target_george_v4_60',
    'target_william_v4_20',
    'target_william_v4_60',
    'target_arthur_v4_20',
    'target_arthur_v4_60',
    'target_thomas_v4_20',
    'target_thomas_v4_60']

# analyse the validation data, but we could do the same on the training data
df = pd.read_parquet('v4/validation.parquet', columns=targets + ['era'])

# compute the mean of the era correlation of every target with any other target
corr = df.groupby('era').corr(method='spearman').mean(axis=0, level=1)

# arrange the order of the columns and rows (for visualization) so that they
# are sorted by correlation with the target 'target_nomi_v4_20' 
corr = corr.sort_values(
    'target_nomi_v4_20',
    axis=0,
    ascending=False).sort_values(
        'target_nomi_v4_20',
        axis=1,
    ascending=False)

sns.heatmap(corr, annot=True)
plt.show()

sns.clustermap(corr)
plt.show()

ryo_matsuzaka · September 19, 2022, 10:57pm

Sorry, it could be a basic question but I don’t understand why there are multiple targets…
When doing training and prediction, we need to choose a target.

For example, the example code chooses a “target_nomi_v4_20”.

github.com

numerai/example-scripts/blob/838bfd1788feaf40362d6bedb3e4683832a9dbb1/utils.py#L10


      
          import numpy as np
          import pandas as pd
          import scipy
          from halo import Halo
          from pathlib import Path
          import json
          from scipy.stats import skew
          
          
ERA_COL = "era"
          TARGET_COL = "target_nomi_v4_20"
          DATA_TYPE_COL = "data_type"
          EXAMPLE_PREDS_COL = "example_preds"
          
          
spinner = Halo(text='', spinner='dots')
          
          
MODEL_FOLDER = "models"
          MODEL_CONFIGS_FOLDER = "model_configs"
          PREDICTION_FILES_FOLDER = "prediction_files"

But I don’t understand why “target_nomi_v4_20” is chosen.
How could it be selected? Is there any rule or a criteria?

thedarklord · September 20, 2022, 1:38am

As I far as I understand, we are trying to predict the ‘target’. However, the other targets provided can do a better job at the predicting the ‘target’ than the ‘target’ itself. Sounds confusing, right? Basically, it means we can use one of other targets, say ‘target_nomi_v4_20’ to train the model. And them use the trained model, which was trained using ‘target_nomi_v4_20’ as Y, to predict ‘target’.

ryo_matsuzaka · September 20, 2022, 2:02am

Yes, it is very confusing…
But now I understood it. I will play with those targets.
Thank you very much.

ryo_matsuzaka · September 20, 2022, 2:11am

Sorry, but still I don’t understand the targets.
Targets consists of the limited sets of discrete numbers.
So I will use classifier not regresser for predictions.
But then some targets(target_brabra) don’t have the value in the target.

wigglemuse · September 20, 2022, 2:15am

target_nomi_v4_20 actually is the “target” that we are scored on (for correlation). But sounds like they have their sights on jerome_60 for the future maybe. (20 = day targets, 60 = 60 day targets).

And the targets are in buckets just like the features (except the distributions are different). In most cases you’ll find a regression approach working better – at least for getting correlation. (Your submission should be real-valued – not just the 5 buckets. We are scored on ranking/ordering, not actual values.)

It should be pointed out that TC evaluation is based on portfolio returns of your model vs others and so kinda doesn’t have a fixed target. Another reason to mix it up.

ryo_matsuzaka · September 20, 2022, 2:24am

@wigglemuse
Thank you very much for the information and advice.
I completely understand it.

In most cases you’ll find a regression approach working better – at least for getting correlation. (Your submission should be real-valued – not just the 5 buckets. We are scored on ranking/ordering, not actual values.)

I agree with you.

taori · September 20, 2022, 8:31am

@ryo_matsuzaka Just to make your life easier, train your model multiple times, each time with a different target. Then create one model slot in your numer.ai account for each target so that you can submit the predictions coming from models trained on different targets to different slot. Then observe which target gives you better results and stake on it.

Topic		Replies	Views
Are predictions discrete or continuous? Tournament	19	3910	May 22, 2021
What to do with the "New Targets for the Tournament" Tournament	8	968	November 15, 2022
Target Cyrus - New Primary Target Announcements	28	5357	May 15, 2023
New Targets for the Tournament Announcements	0	2623	November 12, 2022
Tournament Targets and Target Types Tournament	2	1306	March 7, 2021

Some info on multiple targets?

Related topics