Participant-centric model benchmark

Hello again everyone,

When using the diagnostics page you end up with some insights about the model provided you trained your model only on the vanilla train set. While they certainly may give good feedback about your model quality (apart from TC, but lets not dive into that), what I always found lacking is that they are metrics useful from the viewpoint of Numerai, but not from the viewpoint of a tournament participant.

Since the only metric that we are able to back-test AND stake on right now is CORR, I will assume that the participant will stake on CORR only.

A numerai participant staking on CORR obviously will be burned if the correlation of their predictions are less than zero, and rewarded if greater than zero. So then the question is: How can the particpant minimize the probability of being burned and maximize the probability of getting a reward in any given round?

For that reason the metric that I use for my models is the following: Evaluate the ranked correlation on all validation eras, assume that the per era correlation follows a gaussian distribution, and calculate the probability for having a per era correlation greater than zero.

As a benchmark, I compare it to the example predictions over the same period, and also to random predictions. Here is such a result from one of my latest models:

This result tells me, under the assumption that future eras behave similar to the ones in the validation set, in ~85 % of the weekly eras my model should receive positive correlation, which also is comparable to the example predictions.

If you’re assuming your correlation follows a gaussian, then your negative corr probability is a function of your sharpe ratio.

Yes, but at least for me sharpe ratio is less intuitive than the probability of receiving positive results per round.