I’d mentioned this on OHWA #12 yesterday, and @arbitrage suggested that I post the idea here. The idea is as follows: It is perhaps worth taking a step back and rethinking the tournament as a learning to rank problem rather than a regression problem.
The metric we’re trying to optimize for is a ranking metric which is scale invariant, and the only constraint is that the predicted targets are within the interval [0, 1]
. With that in mind, we don’t need to optimize for squared error, absolute error or something in between the two. The only requirement is that the predicted targets are co-monotonic with the labels. This in my opinion, opens up the design space for loss functions and learning algorithms in general.
Also, I’ve empirically found that although a regression based loss function helps with performance initially. But once the learner converges sufficiently, a regression based loss actively hurts the learner’s performance. I think it’s because memorization (which the regression metric optimizes for) is good, until it isn’t. This also explains why it’s so easy to overfit the data. We try to alleviate this with regularization, but perhaps we could do better than just that.
Traditionally, in learning to rank literature, the problem is usually expressed in 3 ways.
- Pointwise ranking
- Pairwise ranking
- Listwise ranking
Treating the problem as a pointwise ranking problem is essentially treating it as a classification/regression problem. In LTR benchmarks, pairwise ranking almost always beats pointwise ranking. The intuition behind this is that comparing a pair of datapoints is easier than evaluating a single data point. Also, the learner has access to two sets of features to learn from, rather than just one.
The XGBoost
Python API comes with a simple wrapper around its ranking functionality called XGBRanker
, which uses a pairwise ranking objective. catboost
and lightgbm
also come with ranking learners. I’ve added the relevant snippet from a slightly modified example model to replace XGBRegressor
with XGBRanker
. This model does almost as well as the example model on the usual battery of metrics. Tuning hyperparameters to make it better than the example model is left as an exercise to the reader. Listwise ranking could probably be the topic for a future discussion.
Can anyone think of a way to get this to beat the example model? Perhaps combine this with the other recent ideas of era-boosting and feature neutralization.
model = XGBRanker(max_depth=5, learning_rate=0.01, n_estimators=2000, n_jobs=-1, colsample_bytree=0.1)
cdf = training_data.groupby('era').agg(['count'])
group = cdf[cdf.columns[0]].values
del cdf
model.fit(training_data[feature_names], training_data[TARGET_NAME], group=group)
Edits: Fixed a typo: s/Treating the problem as a pairwise ranking problem …/Treating the problem as a pointwise ranking problem is essentially treating it as a classification/regression problem/