Pearson vs. Spearman scoring confusion

Hey all, newbie here.

I keep seeing posts which state that Numerai scores are based on the Spearman correlation coefficient. The comment in the example seems to agree with this. However, the same example and the documentation state that the correlation for scoring is calculated using:

ranked_preds = predictions.rank(pct=True, method="first")
return np.corrcoef(ranked_preds, targets)[0, 1]

which returns the Pearson’s correlation coefficient, according to numpy’s documentation.

Which is actually used by Numerai to evaluate correlation for scoring, Spearman’s or Pearson’s? Is ranking the predictions enough for np.corrcoeff() to return the Spearman correlation?

The second line in the code above does indeed calculate Pearson. However, the line above, which ranks the predictions, means that Spearman is in fact calculated. You can think of this:
Spearman = Ranking + Pearson

1 Like

Not quite – only the predictions are ranked (and ties broken), not the targets (ties remain). So it isn’t fully Spearman.

1 Like

So is this code accurate for how Numerai calculates corr for scoring? Or should I rank the targets as well?

scipy also offers scipy.stats.spearmanr, I’m wondering if that’s the easier option.

It’s accurate, yes. The difference from actual spearman is slight, though.

3 Likes