Learning to Rank

jmrichardson · December 15, 2022, 3:59pm

The default for lgbmranker is lambdarank which I believe is also pairwise. I tried changing the objective to rank_xendcg and got a little better results of .017. However, it is still significantly worse than xgbranker.

taori · December 15, 2022, 4:51pm

The algorithms behind LGBMRanker and XGBRanker are similar but still different. So there is no guarantee that using the same hyperparameters results in the same predictions from both rankers. I wouldn’t be too suspicious about that difference in the results. You might try several sets of hyperparameters and find that a particular set gives better results on LGBMRanker than XGBRanker

jmrichardson · December 15, 2022, 5:11pm

I did try optuna but never got correlation close to xgbranker. The only thing I didn’t try was changing the number of estimators because I usually try to keep them constant until evaluation as a secondary step. Thank you for letting me know that the algos are similar and perhaps it is just a case of not having the right parameters (will keep trying).

sneaky · December 16, 2022, 11:41pm

I am using LGBM with custom objective functions. I tried LGBMRanker but it wasn’t flexible enough.
This one is mean group rank correlation:

def fobj_mean_group_corr(ypred,ytruex):
    groups = ytruex.get_group()
    ytrue = ytruex.get_label()
    return mean_group_corr(ypred,ytrue,groups)

def mean_group_corr(ypred,ytrue,groups):
    # convert to pytorch tensors
    ypred_th = torch.tensor(ypred, requires_grad=True).float()
    # adding some noise
    ypred_th = ypred_th + torch.normal(0,0.0001,ypred_th.size())
    ytrue_th = torch.tensor(ytrue).float()

    all_corrs = []

    pivot = 0
    result = 0
    for g in groups:
        idxes = list(range(pivot,pivot+g))
        #print(ytrue_th[idxes].detach())

        pred = torch.flatten(
            torchsort.soft_rank(
                torch.reshape(ypred_th[idxes],(1,g)),
                regularization="l2",
                regularization_strength=1,
            )
        )/torch.tensor(g)
        corr_res = torch.sqrt(- torch_corr(pred,ytrue_th[idxes]) +  torch.tensor(1))
        all_corrs.append(corr_res)
        pivot += g

    all_corrs = torch.stack(all_corrs)

    # calculate mean
    loss = all_corrs.mean()
    print(f'MGC Current loss:{loss}')

    # calculate gradient and convert to numpy
    loss_grads = grad(loss, ypred_th, create_graph=True)[0]
    loss_grads = loss_grads.detach().numpy()
    print(loss_grads)

    # return gradient and ones instead of Hessian diagonal
    return loss_grads, np.ones(loss_grads.shape)

bigbertha · March 19, 2023, 9:03am

Thank you very much for sharing this idea.
Is there any reason you are adding noise to the predictions? Is it needed to get a unique sorting?

sneaky · March 20, 2023, 8:21am

If I remember it correctly, if predictions are all zeros/ones/… then soft_rank returns also one number everywhere, and the autograd returns nans. So this was my hack fix. It is not pretty but it works.

bigbertha · March 20, 2023, 8:23am

I had this issue as well. I think the reason is that the custom objective does not work from the start.

With your method you are introducing (a very small) randomness all the time. I had success with the following two hacks:

completely randomize the predictions in the first round
boost one tree with a normal objective

Both of these only affect the start

sneaky · March 20, 2023, 8:28am

The randomness should not matter in the end right? But it is better to get rid of it, because it is an unnecessary step that the model calculates all the time for nothing.

bigbertha · March 20, 2023, 8:40am

yes, you are right.
A post has to be at least 20 characters

sneaky · March 21, 2023, 1:30pm

Tried your solution and the results were sadly underwhelming. Maybe I did something wrong. 1 epoch wasn’t enough and often even with 100 epochs pretrained the models were failing to learn. I measured the time that the function saves by dumping the noise line and it is like 0.001 seconds. So with 30000 epochs its is like half a second. So I am gonna stick to my implementation for now. But thank you for the input!

Topic		Replies	Views
Objetive Function Data Science	7	2518	April 13, 2021
Optimizing for FNC and TB scores Tournament	31	6430	May 26, 2022
Differentiable Spearman in PyTorch (Optimize for CORR directly) Data Science	30	23901	November 7, 2023
Hyperparameters optimization for "small" LGBM models Data Science	7	2192	October 9, 2023
Deep metric learning to find a close era to live Data Science	2	1833	March 12, 2021

Learning to Rank

Related topics