Just writing this to share which target functions you use the most when training your models. I was thinking of customizing an Objective Function for boosted models in order to beat the common methods already developed. I know Spearman’s correlation is non-differentiable due to sort and rank steps, but I found some references to try to deal with these problems:

I’ve tried to use SoDeep loss functions when training my MLPs and it was a complete disaster. So it would be nice to hear some tips from you all. Do you keep going with RMSE, MSE, MAE. MAPE, LOGLOSS…?

I’ve tried using KL Divergence for learning to rank (see here: https://theiconic.tech/learning-to-rank-is-good-for-your-ml-career-part-2-lets-implement-listnet-11af69d1704). Ended up getting slightly worse results than just regular MSE so I didn’t explore it too much. I might come back to it eventually, seemed cool at the time

I just used pearsonr since it seemed close enough without the sort. With NN and pytorch era batches i got better validation with pearsonr + mseloss than just mseloss. Only have a couple rounds started on live though.

Would like to eventually get some ranking and feature neutralization directly in the loss.

3 Likes

Did you code yourself the person correlation loss function? Or is it implemented elsewhere?

I used this code

define the function variables

```
criterion = nn.MSELoss()
corr_loss_fn = pearsonr
```

then in pytorch loop with loss functions I called like this. but depending on your modelling results might be different indexing. Also not sure if constants on losses are relevant. I get confused about this.

```
preds = model(x)
loss = criterion(preds[0], y)
corr_loss = 1 - corr_loss_fn(preds[0].squeeze(), y.squeeze())
if USE_CORR_LOSS:
loss += corr_loss * 0.05
if phase=='train':
loss.backward()
optimizer.step()
```

1 Like

I’m using fast-soft-sort for my neural nets, is better than MSE for me, but still worse than simple xgboost. I must be doing something wrong.

Same for me, it does not work as good as expected.

I can’t see how the gradient and the hessian are computed in your code