Learning to Rank

The default for lgbmranker is lambdarank which I believe is also pairwise. I tried changing the objective to rank_xendcg and got a little better results of .017. However, it is still significantly worse than xgbranker.

The algorithms behind LGBMRanker and XGBRanker are similar but still different. So there is no guarantee that using the same hyperparameters results in the same predictions from both rankers. I wouldn’t be too suspicious about that difference in the results. You might try several sets of hyperparameters and find that a particular set gives better results on LGBMRanker than XGBRanker

1 Like

I did try optuna but never got correlation close to xgbranker. The only thing I didn’t try was changing the number of estimators because I usually try to keep them constant until evaluation as a secondary step. Thank you for letting me know that the algos are similar and perhaps it is just a case of not having the right parameters (will keep trying).

I am using LGBM with custom objective functions. I tried LGBMRanker but it wasn’t flexible enough.
This one is mean group rank correlation:

def fobj_mean_group_corr(ypred,ytruex):
    groups = ytruex.get_group()
    ytrue = ytruex.get_label()
    return mean_group_corr(ypred,ytrue,groups)

def mean_group_corr(ypred,ytrue,groups):
    # convert to pytorch tensors
    ypred_th = torch.tensor(ypred, requires_grad=True).float()
    # adding some noise
    ypred_th = ypred_th + torch.normal(0,0.0001,ypred_th.size())
    ytrue_th = torch.tensor(ytrue).float()

    all_corrs = []

    pivot = 0
    result = 0
    for g in groups:
        idxes = list(range(pivot,pivot+g))
        #print(ytrue_th[idxes].detach())

        pred = torch.flatten(
            torchsort.soft_rank(
                torch.reshape(ypred_th[idxes],(1,g)),
                regularization="l2",
                regularization_strength=1,
            )
        )/torch.tensor(g)
        corr_res = torch.sqrt(- torch_corr(pred,ytrue_th[idxes]) +  torch.tensor(1))
        all_corrs.append(corr_res)
        pivot += g

    all_corrs = torch.stack(all_corrs)

    # calculate mean
    loss = all_corrs.mean()
    print(f'MGC Current loss:{loss}')

    # calculate gradient and convert to numpy
    loss_grads = grad(loss, ypred_th, create_graph=True)[0]
    loss_grads = loss_grads.detach().numpy()
    print(loss_grads)

    # return gradient and ones instead of Hessian diagonal
    return loss_grads, np.ones(loss_grads.shape)


4 Likes

Thank you very much for sharing this idea.
Is there any reason you are adding noise to the predictions? Is it needed to get a unique sorting?

If I remember it correctly, if predictions are all zeros/ones/… then soft_rank returns also one number everywhere, and the autograd returns nans. So this was my hack fix. It is not pretty but it works.

I had this issue as well. I think the reason is that the custom objective does not work from the start.

With your method you are introducing (a very small) randomness all the time. I had success with the following two hacks:

  • completely randomize the predictions in the first round
  • boost one tree with a normal objective

Both of these only affect the start

1 Like

The randomness should not matter in the end right? But it is better to get rid of it, because it is an unnecessary step that the model calculates all the time for nothing.

yes, you are right.
A post has to be at least 20 characters

1 Like

Tried your solution and the results were sadly underwhelming. Maybe I did something wrong. 1 epoch wasn’t enough and often even with 100 epochs pretrained the models were failing to learn. I measured the time that the function saves by dumping the noise line and it is like 0.001 seconds. So with 30000 epochs its is like half a second. So I am gonna stick to my implementation for now. But thank you for the input!