At the risk of exposing a complete lack of understanding, I hope someone here can clear up a little confusion I have regarding torch_sort.soft_rank().
I have been using this in pytorch, with some success, but I’m wondering if I may still be implementing the loss incorrectly and simply getting lucky.
I notice in the example docs that torch.autograd.grad() is used to compute the gradient. What I do not understand is whether this is needed for a full torch implementation or if this is being done for people to extract the gradient and use in another tool set, such as XGB.
In a fully torch module, if I calculate the correlation Loss and then apply loss.backward(), is there really any need for me to extract the gradient myself?
Here is my code:
import torchsort
def t_corrcoef(target, pred):
pred_n = pred - pred.mean()
target_n = target - target.mean()
pred_n = pred_n / pred_n.norm()
target_n = target_n / target_n.norm()
return (pred_n * target_n).sum()
#
# FUNCTION: t_spearman()
# - to calculate differentiable spearman corr for torch training
#
def t_spearman( target, pred, regularization="l2", regularization_strength=1.0):
# fast_soft_sort uses 1-based indexing, divide by len to compute percentage of rank
pred = torchsort.soft_rank( pred.cpu(),
regularization=regularization,
regularization_strength=regularization_strength )
return t_corrcoef(target, pred.to( target.device) / pred.to( target.device).shape[-1])
so my understanding is that calculating corr_loss would follow:
corr = t_spearman( batch[ 'Y'].unsqueeze(0), preds.unsqueeze(0), regularization="l2", regularization_strength=1.0)
loss = 1.0 - corr
loss.backward()
Is this correct? If so, might there be a way to apply sample based weights to the loss. Similar to using torch.nn.MSELoss( reduction=‘none’), in order to provide a greater penalty to samples based on the non-uniform distribution?