Differentiable Spearman in PyTorch (Optimize for CORR directly)

@mdo previously showed how to use a custom loss function which involved taking the gradient of the sharpe ratio of the Pearson correlations over different eras. Although Pearson and Spearman might return similar values, it could be rewarding to optimize for Spearman directly (or Sharpe of Spearman). Since the ranked Spearman correlation needs a sort operation (which is not differentiable), it has not been possible to compute the gradient with respect to predictions, which eliminated the possibility of using Spearman as a loss function for GBM or neural nets.

A recent paper, Fast Differentiable Sorting and Ranking, introduced a novel method for differentiable sorting and ranking, with the added bonus of O(n \log n) complexity (I would encourage reading the paper to learn more). We can leverage their open sourced code google-research/fast-soft-sort in order to implement a differentiable version of the Spearman metric used by Numerai:

from fast_soft_sort.pytorch_ops import soft_rank

def corrcoef(target, pred):
    # np.corrcoef in torch from @mdo
    # https://forum.numer.ai/t/custom-loss-functions-for-xgboost-using-pytorch/960
    pred_n = pred - pred.mean()
    target_n = target - target.mean()
    pred_n = pred_n / pred_n.norm()
    target_n = target_n / target_n.norm()
    return (pred_n * target_n).sum()


def spearman(
    target,
    pred,
    regularization="l2",
    regularization_strength=1.0,
):
    # fast_soft_sort uses 1-based indexing, divide by len to compute percentage of rank
    pred = soft_rank(
        pred,
        regularization=regularization,
        regularization_strength=regularization_strength,
    )
    return corrcoef(target, pred / pred.shape[-1])

We can then use this function to find the gradients of a set of predictions with respect to the correlation and compare to the scoring metric introduced in the scoring section of the docs:

def numerai_spearman(target, pred):
    # spearman used for numerai CORR
    return np.corrcoef(target, pred.rank(pct=True, method="first"))[0, 1]

# my spearman requires having batch dimension as first.
pred = torch.rand(1, 10, requires_grad=True)
target = torch.rand(1, 10)

print("Numerai CORR", numerai_spearman(
    pd.Series(target[0].detach().numpy()),
    pd.Series(pred[0].detach().numpy()),
))

s = spearman(target, pred, regularization_strength=1e-3)
gradient = torch.autograd.grad(s, pred)[0]
print("Differentiable CORR", s.item())
Numerai CORR 0.7355864488990377
Differentiable CORR 0.735586404800415
Gradient tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])

With a very small regularization_strength, you will obtain a very accurate correlation, but likely no gradients. To obtain proper gradients you will need to increase regularization_strength, which will also lead to slightly inaccurate correlation measures:

s = spearman(target, pred, regularization_strength=1e-2)
Numerai CORR 0.7355864488990377
Differentiable CORR 0.7345704436302185
Gradient tensor([[-2.9164,  0.0000,  0.0000,  0.0000,  0.0000,  1.7082,  2.9164,  0.0000,
          0.0000, -1.7082]])

Ultimately it seems something like this could be useful for neural network or gradient boosting models; I will update this model examples, but I am curious if anyone else has had success using something like this.

13 Likes

I used just pearson in pytorch with success. By success I mean in optuna it chose to use it. Still waiting on live before I continue with overfitting NN and p hacking. Want to use spearman but thought the sort might be slower. Also want to try FNC loss directly when I go back to this.

But as @robbo_the_fossil pointed out in chat you can just neutralize your targets as pre processing. This is also what I did on NN models.

Also using @mdo feature reversal and dropout functions since they improved validation for me.

3 Likes

I wrote a post discussing the same topic.
I also tried differentiable Spearman’s on my PyTorch MLP and it worked horrible. I was also thinking of trying an approach like the one you have proposed using fast-soft-sort, but I am stuck writing the customized loss function for boosted models.I don’t know how I can define the gradient and the hessian from fast-soft-sort code.

This function should work (at least in theory) similar to @mdo’s method, using 1’s for the hessian:

train_df = ...
era_idx = [np.where(eras == e)[0] for e in np.unique(train_df.eras)]

def loss_fn(target, pred):
    pred = torch.tensor(pred, requires_grad=True)
    target = torch.tensor(target)
    corrs = torch.stack([spearman(target[e], pred[e]) for e in era_idx])
    sharpe = adj_sharpe(corrs)
    gradient = torch.autograd.grad(sharpe, pred)[0].detach().numpy()
    hessian = np.ones_like(gradient)  # ones for hessian should be ~okay~
    return gradient, hessian

model = XGBRegressor(objective=loss_fn)

I believe the fast_soft_sort code is currently incompatible with higher order derivatives (since it uses a conversion to numpy internally, I believe their JAX implementation might work however).

I think in order to get good results with something like this it will be necessary to pretrain a model to optimize MSE and then use it as a base margin/weight initialization for xgboost/neural nets.

3 Likes

This did really amazing things to my sharpe - perhaps expected. However todays CORR, MMC and FNC are some of the worst I have ever gotten. Today was the first day I had a model deployed with this type of neutralization with live testing. Glad I didn’t stake. Is this working for you at all on live data, @greenprophet ?

2 Likes

I am new and have very little concrete to go on. Only have stable submissions for 253 and 254. My Feature neutralized training models have the best validation by a bit so they are weighted a bit more but it is not all of the models. And I also still had FE so I have post process neutralized some of them as well. Of 3 models and 2 rounds all are positive overall and on the day if that answers your question.

they are basically all the same though so I can see nuances of live. 1 is optimized blend with .5 post process neutralization, 1 is optimized blend with no post process neutralization, 1 is even blend with .5 post process neutralization,

Today was a different day for sure though. 254 is very positive for me but still completely crushed by integration_test and linear models. Also my non post processed model is out performing the other 2 which it is not on 253.

Really I have very little to go on so far. And today is the first day of 254 so going to be volatile.

2 Likes

Thanks for the reply. Sure, beginning of round is very volatile, but I dont think any of my models ever started so far away from 0 CORR and my FN did not help in that regard (opposite actually i=on the first day) - well there is just random things happening all the time I guess. We will see how much things move towards a happier mean (or not!) :wink:

Can I use this loss function on GPU??

D:\DL\numerai\fast_soft_sort\pytorch_ops.py in forward(ctx, values)
 36       #values = values.to(device2)
 37       #
---> 38       obj = cls(values.detach().numpy(), **kwargs)
 39       ctx.numpy_obj = obj
 40       return torch.from_numpy(obj.compute())

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

I have run it on my CUDA 11 GPU without issue.

1 Like

Thank you for your reply.
I also ran my code on CUDA 11 GPU on google colaboratory, but I had a problem about GPU.

!nvcc --version
!git clone https://github.com/google-research/fast-soft-sort/

import os
path = './fast-soft-sort'
os.chdir(path)

import torch
from fast_soft_sort.pytorch_ops import soft_rank, soft_sort

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print(device)
values = torch.tensor([[5., 1., 2.], [2., 1., 5.]], dtype=torch.float64).to(device)
soft_sort(values, regularization_strength=1.0)

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Wed_Jul_22_19:09:09_PDT_2020
Cuda compilation tools, release 11.0, V11.0.221
Build cuda_11.0_bu.TC445_37.28845127_0
Cloning into 'fast-soft-sort'...
remote: Enumerating objects: 76, done.
remote: Counting objects: 100% (76/76), done.
remote: Compressing objects: 100% (42/42), done.
remote: Total 76 (delta 44), reused 64 (delta 33), pack-reused 0
Unpacking objects: 100% (76/76), done.
cuda:0
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-9-d7a59b351a2b> in <module>()
     12 print(device)
     13 values = torch.tensor([[5., 1., 2.], [2., 1., 5.]], dtype=torch.float64).to(device)
---> 14 soft_sort(values, regularization_strength=1.0)

3 frames
/content/fast-soft-sort/fast_soft_sort/pytorch_ops.py in forward(ctx, values)
     32     @staticmethod
     33     def forward(ctx, values):
---> 34       obj = cls(values.detach().numpy(), **kwargs)
     35       ctx.numpy_obj = obj
     36       return torch.from_numpy(obj.compute())

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

I had that issue locally as well. I’m running inside of a docker container, so I just rebooted it and it worked. I’m not 100% sure what caused that issue.

How did you include fast_soft_sort? If you copied the files into your directory you could always modify that line of code to explicitly convert off the GPU as the error message mentions.

According to a github issue:

At the moment we do not have a GPU implementation of the projection operators, which is the cause for the error. We decided not to do this conversion implicitly as we want the user to be aware that a device copy is necessary. If you want that behavior, can you write a small util function like

def soft_sort(array):
   return pytorch_ops.soft_sort(array.cpu()).cuda()

This solution worked fine for me. Having to perform the operation on CPU is not ideal, but it didn’t seem to penalize training times too poorly.

1 Like

Update: I am currently working on a pure PyTorch implementation, GitHub - teddykoker/torchsort: Fast, differentiable sorting and ranking in PyTorch, of the differentiable sorting and ranking algorithm, which will allow for faster computation on GPU directly. It currently works, but is slower than the original (I believe my C++ isotonic regression solver needs some optimization as it is slower than the Numba JIT’d python) Edit: got an optimized version working, should be out soon. (GPU kernels are a pain :roll_eyes:)

3 Likes

Has anyone pulled this off for TensorFlow? I would assume it shouldn’t be much work converting the existing PyTorch example.

I am just starting (submitted my first today) so I have a lot of work to do with my model.

Torchsort now has a speed on par with PyTorch’s built in sort (but is differentiable). Numerai spearman loss can be implemented like:

import torchsort

def corrcoef(target, pred):
    pred_n = pred - pred.mean()
    target_n = target - target.mean()
    pred_n = pred_n / pred_n.norm()
    target_n = target_n / target_n.norm()
    return (pred_n * target_n).sum()


def spearman(
    target,
    pred,
    regularization="l2",
    regularization_strength=1.0,
):
    pred = torchsort.soft_rank(
        pred,
        regularization=regularization,
        regularization_strength=regularization_strength,
    )
    return corrcoef(target, pred / pred.shape[-1])

I implemented the CUDA kernel as well, it scales pretty well with batch size and sequence length:

5 Likes

Thanks for sharing @teddykoker. I have been willing to implement Fast-Soft_Sort for a few months now so that should help. I will play around with it and try to optimize on Corr/Sharpe/Custom functions over the weekend see what comes out of it. Did you experience any memory limitation implementing it?

Thank you for your reply. I could run soft_sort on my Colab.