Pearson vs. Spearman scoring confusion

oiboy · March 27, 2021, 6:23pm

Hey all, newbie here.

I keep seeing posts which state that Numerai scores are based on the Spearman correlation coefficient. The comment in the example seems to agree with this. However, the same example and the documentation state that the correlation for scoring is calculated using:

ranked_preds = predictions.rank(pct=True, method="first")
return np.corrcoef(ranked_preds, targets)[0, 1]

which returns the Pearson’s correlation coefficient, according to numpy’s documentation.

Which is actually used by Numerai to evaluate correlation for scoring, Spearman’s or Pearson’s? Is ranking the predictions enough for np.corrcoeff() to return the Spearman correlation?

quantized · March 28, 2021, 11:34am

The second line in the code above does indeed calculate Pearson. However, the line above, which ranks the predictions, means that Spearman is in fact calculated. You can think of this:
Spearman = Ranking + Pearson

wigglemuse · March 28, 2021, 2:41pm

Not quite – only the predictions are ranked (and ties broken), not the targets (ties remain). So it isn’t fully Spearman.

oiboy · March 28, 2021, 3:42pm

So is this code accurate for how Numerai calculates corr for scoring? Or should I rank the targets as well?

scipy also offers scipy.stats.spearmanr, I’m wondering if that’s the easier option.

wigglemuse · March 28, 2021, 3:50pm

It’s accurate, yes. The difference from actual spearman is slight, though.

Topic		Replies	Views
A more clear understanding of the Ranked Correlation Data Science	5	2851	April 1, 2021
Expected "Score" Value Tournament	2	737	March 20, 2021
Differentiable Spearman in PyTorch (Optimize for CORR directly) Data Science	30	24161	November 7, 2023
Submission core metrics Tournament	3	1771	October 2, 2020
Signals Neutralization? Signals	1	1303	June 10, 2021

Pearson vs. Spearman scoring confusion

Related topics