@mdo previously showed how to use a custom loss function which involved taking the gradient of the sharpe ratio of the Pearson correlations over different eras. Although Pearson and Spearman might return similar values, it could be rewarding to optimize for Spearman directly (or Sharpe of Spearman). Since the ranked Spearman correlation needs a sort operation (which is not differentiable), it has not been possible to compute the gradient with respect to predictions, which eliminated the possibility of using Spearman as a loss function for GBM or neural nets.
A recent paper, Fast Differentiable Sorting and Ranking, introduced a novel method for differentiable sorting and ranking, with the added bonus of O(n \log n) complexity (I would encourage reading the paper to learn more). We can leverage their open sourced code google-research/fast-soft-sort in order to implement a differentiable version of the Spearman metric used by Numerai:
from fast_soft_sort.pytorch_ops import soft_rank
def corrcoef(target, pred):
# np.corrcoef in torch from @mdo
# https://forum.numer.ai/t/custom-loss-functions-for-xgboost-using-pytorch/960
pred_n = pred - pred.mean()
target_n = target - target.mean()
pred_n = pred_n / pred_n.norm()
target_n = target_n / target_n.norm()
return (pred_n * target_n).sum()
def spearman(
target,
pred,
regularization="l2",
regularization_strength=1.0,
):
# fast_soft_sort uses 1-based indexing, divide by len to compute percentage of rank
pred = soft_rank(
pred,
regularization=regularization,
regularization_strength=regularization_strength,
)
return corrcoef(target, pred / pred.shape[-1])
We can then use this function to find the gradients of a set of predictions with respect to the correlation and compare to the scoring metric introduced in the scoring section of the docs:
def numerai_spearman(target, pred):
# spearman used for numerai CORR
return np.corrcoef(target, pred.rank(pct=True, method="first"))[0, 1]
# my spearman requires having batch dimension as first.
pred = torch.rand(1, 10, requires_grad=True)
target = torch.rand(1, 10)
print("Numerai CORR", numerai_spearman(
pd.Series(target[0].detach().numpy()),
pd.Series(pred[0].detach().numpy()),
))
s = spearman(target, pred, regularization_strength=1e-3)
gradient = torch.autograd.grad(s, pred)[0]
print("Differentiable CORR", s.item())
Numerai CORR 0.7355864488990377
Differentiable CORR 0.735586404800415
Gradient tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
With a very small regularization_strength
, you will obtain a very accurate correlation, but likely no gradients. To obtain proper gradients you will need to increase regularization_strength
, which will also lead to slightly inaccurate correlation measures:
s = spearman(target, pred, regularization_strength=1e-2)
Numerai CORR 0.7355864488990377
Differentiable CORR 0.7345704436302185
Gradient tensor([[-2.9164, 0.0000, 0.0000, 0.0000, 0.0000, 1.7082, 2.9164, 0.0000,
0.0000, -1.7082]])
Ultimately it seems something like this could be useful for neural network or gradient boosting models; I will update this model examples, but I am curious if anyone else has had success using something like this.