Differentiable Spearman in PyTorch (Optimize for CORR directly)

javiermoral · June 21, 2021, 7:40am

which code did yo try?

hiromuhana · June 21, 2021, 12:46pm

I mixed @paulito & @teddykoker 's code for the objective in lightgbm as below.

def spearman_loss_lgb(ytrue, ypred):
    
    def corrcoef(target, pred):
        pred_n = pred - pred.mean()
        target_n = target - target.mean()
        pred_n = pred_n / pred_n.norm()
        target_n = target_n / target_n.norm()
        return (pred_n * target_n).sum()

    def differentiable_spearman(target, pred, regularization="l2", regularization_strength=1.0,):
        pred = torchsort.soft_rank(
            pred,
            regularization=regularization,
            regularization_strength=regularization_strength,
        )
        return corrcoef(target, pred / pred.shape[-1])
    
    lenypred = ypred.shape[0]
    lenytrue = ytrue.shape[0]

    ypred_th = torch.tensor(ypred.reshape(1, lenypred), requires_grad=True)
    ytrue_th = torch.tensor(ytrue.reshape(1, lenytrue))

    loss = differentiable_spearman(ytrue_th, ypred_th, regularization_strength=1e-2)
    # print(f'Current loss:{loss}')

    # calculate gradient and convert to numpy
    loss_grads = torch.autograd.grad(loss, ypred_th)[0]
    loss_grads = loss_grads.to('cpu').detach().numpy()

    # return gradient and ones instead of Hessian diagonal
    return loss_grads[0], np.ones(loss_grads.shape)[0]

bigbertha · January 3, 2022, 7:55pm

And by disaster you mean the huge swings in the score?
Maybe it is just me but it looks like it is converging … have you tried simply running 300 more trials?

bigbertha · January 3, 2022, 7:58pm

terribly stupid question: Is this a loss or do we have to return -corrcoef(…) when we want to use this as a loss function?

pumplerod · January 5, 2022, 8:30pm

At the risk of exposing a complete lack of understanding, I hope someone here can clear up a little confusion I have regarding torch_sort.soft_rank().

I have been using this in pytorch, with some success, but I’m wondering if I may still be implementing the loss incorrectly and simply getting lucky.

I notice in the example docs that torch.autograd.grad() is used to compute the gradient. What I do not understand is whether this is needed for a full torch implementation or if this is being done for people to extract the gradient and use in another tool set, such as XGB.

In a fully torch module, if I calculate the correlation Loss and then apply loss.backward(), is there really any need for me to extract the gradient myself?

Here is my code:

import torchsort
def t_corrcoef(target, pred):
    pred_n = pred - pred.mean()
    target_n = target - target.mean()
    pred_n = pred_n / pred_n.norm()
    target_n = target_n / target_n.norm()
    return (pred_n * target_n).sum()

#
# FUNCTION: t_spearman()
#   - to calculate differentiable spearman corr for torch training
#
def t_spearman( target, pred, regularization="l2", regularization_strength=1.0):
    # fast_soft_sort uses 1-based indexing, divide by len to compute percentage of rank

    pred = torchsort.soft_rank( pred.cpu(),
                                regularization=regularization,
                                regularization_strength=regularization_strength )
    return t_corrcoef(target, pred.to( target.device) / pred.to( target.device).shape[-1])

so my understanding is that calculating corr_loss would follow:

corr = t_spearman( batch[ 'Y'].unsqueeze(0), preds.unsqueeze(0), regularization="l2", regularization_strength=1.0)
loss = 1.0 - corr
loss.backward()

Is this correct? If so, might there be a way to apply sample based weights to the loss. Similar to using torch.nn.MSELoss( reduction=‘none’), in order to provide a greater penalty to samples based on the non-uniform distribution?

adalseno · January 20, 2022, 12:01am

Hi @teddykoker , thank you very much for your code. I was trying to use it for a loss custom function to be used with TabNet but I had an issue that took me sometime to debug. Your function expects a tensor in the form (1,X) while TabNet passes it in the form (X,1) so I had to reshape them. The function uses the default regularization (that is “l2”) and seems to work fine.

def spearman(pred, target):

    x = 1e-2
    pred = torchsort.soft_rank(pred.reshape(1,-1),regularization_strength=x)
    target = torchsort.soft_rank(target.reshape(1,-1),regularization_strength=x)
    pred = pred - pred.mean()
    pred = pred / pred.norm()
    target = target - target.mean()
    target = target / target.norm()

    return (pred * target).sum()

In case someone needs also a metric this one should work:

class Sprme_Metric(Metric):
    """
    sprme.
    """

    def __init__(self):
        self._name = "sprme" # write an understandable name here
        self._maximize = True

    def __call__(self, y_true, y_score):
        """
        Compute Spearman Correlation of predictions.

        Parameters
        ----------
        y_true: np.ndarray
            Target matrix or vector
        y_score: np.ndarray
            Score matrix or vector

        Returns
        -------
            float
            Spearman of predictions vs targets.
        """
        return spearman(torch.from_numpy(y_score), torch.from_numpy(y_true)).item()

adalseno · January 24, 2022, 8:33pm

I have a kind of Hamlet doubt: in regression we should try to minimise the loss. In such a case the loss should return 1 - ret instead of just ret. Since the Spearman index goes from -1 to +1, 1 - ret will be zero when the index is +1 (what we want, perfect positive correlation) and will be maximum when the index is -1 (that we don’t want, perfect negative correlation). So to minimise the loss we should find an index close to 1.
Am I wrong?
In such a case the metric should return 1 - ret. In fact we want to maximise the metric and the value will be maximum when ret is zero (that is when the index is 1, what we want), and will be the minimum when ret is equal to two (that is when the index is -1 that we don’t want). Otherwise we can simply return the amended loss (1-ret) and set _maximize = False.
What do you think?

here the revised code:

def spearman(pred, target):

    x = 1e-3
    pred = torchsort.soft_rank(pred.reshape(1,-1),regularization_strength=x)
    target = torchsort.soft_rank(target.reshape(1,-1),regularization_strength=x)
    pred = pred - pred.mean()
    pred = pred / pred.norm()
    target = target - target.mean()
    target = target / target.norm()
    ret = 1- (pred * target).sum()
    return ret

In my case x = 1e-3 gave better results. And for the metric (the simplest form):

class Sprme_Metric(Metric):
    """
    sprme.
    """

    def __init__(self):
        self._name = "sprme" # write an understandable name here
        self._maximize = False

    def __call__(self, y_true, y_score):
        """
        Compute Spearman Correlation of predictions.

        Parameters
        ----------
        y_true: np.ndarray
            Target matrix or vector
        y_score: np.ndarray
            Score matrix or vector

        Returns
        -------
            float
            Spearman of predictions vs targets.
        """
        return spearman(torch.from_numpy(y_score), torch.from_numpy(y_true)).item()

ervinjason · May 30, 2022, 6:57pm

pumplerod:

At the risk of exposing a complete lack of understanding, I hope someone here can clear up a little confusion I have regarding torch_sort.soft_rank().

I have been using this in pytorch, with some success, but I’m wondering if I may still be implementing the loss incorrectly and simply getting lucky.

I notice in the example docs that torch.autograd.grad() is used to compute the gradient. What I do not understand is whether this is needed for a full torch implementation or if this is being done for people to extract the gradient and use in another tool set, such as XGB.

In a fully torch module, if I calculate the correlation Loss and then apply loss.backward(), is there really any need for me to extract the gradient myself?

Here is my code:
import torchsort
def t_corrcoef(target, pred):
    pred_n = pred - pred.mean()
    target_n = target - target.mean()
    pred_n = pred_n / pred_n.norm()
    target_n = target_n / target_n.norm()
    return (pred_n * target_n).sum()

#
# FUNCTION: t_spearman()
#   - to calculate differentiable spearman corr for torch training
#
def t_spearman( target, pred, regularization="l2", regularization_strength=1.0):
    # fast_soft_sort uses 1-based indexing, divide by len to compute percentage of rank

    pred = torchsort.soft_rank( pred.cpu(),
                                regularization=regularization,
                                regularization_strength=regularization_strength )
    return t_corrcoef(target, pred.to( target.device) / pred.to( target.device).shape[-1])
so my understanding is that calculating corr_loss would follow:
corr = t_spearman( batch[ 'Y'].unsqueeze(0), preds.unsqueeze(0), regularization="l2", regularization_strength=1.0)
loss = 1.0 - corr
loss.backward()
Is this correct? If so, might there be a way to apply sample based weights to the loss. Similar to using torch.nn.MSELoss( reduction=‘none’), in order to provide a greater penalty to samples based on the non-uniform distribution?

I am new and have very little concrete to go on. Only have stable submissions for 253 and 254. My Feature neutralized training models have the best validation by a bit so they are weighted a bit more but it is not all of the models. And I also still had FE so I have post process neutralized some of them as well. Of 3 models and 2 rounds all are positive overall and on the day if that answers your question.

they are basically all the same though so I can see nuances of live. 1 is optimized blend with .5 post process neutralization, 1 is optimized blend with no post process neutralization, 1 is even blend with .5 post process neutralization,

Today was a different day for sure though. 254 is very positive for me but still completely crushed by integration_test and linear models. Also my non post processed model is out performing the other 2 which it is not on 253.

Really I have very little to go on so far. And today is the first day of 254 so going to be volatile.

bigbertha · January 30, 2023, 2:24pm

Is there feature neutralization code for pytorch?
I did some trials but never got it to work. Furthest I got was that there is no GPU implementation for least squares.

oraculum · February 1, 2023, 9:12pm

There is a code for neutralization with pytorch in this thread by @mdo (in the training loop using pytorch’s pinverse):

f58c · November 7, 2023, 11:21pm

Hi Teddy, training models for corr is interesting- but have you compared model performance vs. training for the cyrus_v4_20 target? i’d think that a model that can predict cyrus_v4_20 should also rank well for corr20v2. the hiccup i’ve been running up against is that sometimes high corr20v2 performance results in negative tc. i’d love to hear your thoughts on this.