About the target discretization

rpica · March 7, 2024, 9:03am

Another view of the same thing: We can have really good predictions, but either because it’s a float number (almost never exactly the very same value as the target), it doesn’t matter the order (rank) to get a better corr as I have always understood that numerai does:

import numpy as np
import pandas as pd
from numerai_tools.scoring import numerai_corr

def compare(pred, target):
    pred = pd.DataFrame({'predictions': pred})
    target = pd.DataFrame({'target': target})['target']
    return numerai_corr(pred, target)

target = [0] + [.5]*100 + [1]
pred = [0] + list(np.linspace(0.45,0.55,len(target)-2)) + [1]
out = compare(pred, target)
print(f'output: {out[0]:.3f}')

for _ in range(10):
    pred = [0] + list(np.random.uniform(low=0.45, high=0.55, size=100)) + [1]
    out = compare(pred, target)
    print(f'output: {out[0]:.3f}')

Output

output: 0.468
output: 0.468
output: 0.468
output: 0.468
output: 0.468
output: 0.468
output: 0.468
output: 0.468
output: 0.468
output: 0.468
output: 0.468

So, the order around that 0.5 value of the target, does not make a difference (here a random cloud of points near the target value was generated each time).

I must be misunderstanding something, I suspect:

That the actual score is done against a non-discretized version of the target, which would mean that we have an artificially difficult target (bad for everyone, isn’t it?)
Maybe I’m simplifying the example too much? But the target is 5 buckets… I’m working around one. I think even if the “output” changes in value, the effect is the same.

If it is none of those, I wonder if this message still applies:

Because according to my analysis, breaking the tie drops the corr from that 0.9ish to 0.5ish values (again in this example).

Any idea of what I’m missing?

Topic		Replies	Views
Submission Question Tournament	4	917	January 2, 2021
Expected "Score" Value Tournament	2	739	March 20, 2021
Are predictions discrete or continuous? Tournament	19	3906	May 22, 2021
Tournament Targets and Target Types Tournament	2	1305	March 7, 2021
Noob question regarding predictions Tournament	3	966	April 1, 2021

About the target discretization

Related topics