Era Boosted Models

Thank you for sharing era boost idea and implementation.

The first argument of spearmanr defined in this script is target.
but this function is used with pred specified as the first argument.

Pull Request

I fixed incorrect order of arguments in spearmanr.

3 Likes

At least for LGBM, I do not think that this is true. I believe (but am not sure anymore) that I also checked that behaviour with xgboost.
I tried to use model.n_estimators += trees_per_step at first and noticed: Each iteration takes longer and longer to train which lead me to investigate that issue. I was using something like 100 trees per step and 10 iterations. I then did some tests with model.n_estimators += trees_per_step and without it.

Example with trees_per_step=5 and num_iters=10:

Without model.n_estimators += trees_per_step:
model.n_estimators prints 5 which is logical since it got initialized with 5 trees.
models.booster_.num_trees() prints 50 which is trees_per_step*num_iters. So 50 trees have been build with 5 per iterations, as expected. The elapsed time for each iteration is about equals since it always trains the same number of trees (5).

With model.n_estimators += trees_per_step:
model.n_estimators prints 50. The parameter n_estimators is a cumulative sum of trees_per_step for each iteration, thus the final value is trees_per_step*num_iters. Seems correct.

But: models.booster_.num_trees() prints 275 which means that 275 trees have been build. This seems weird at first. But since each iterations has an increasing number of n_estimators, it means that more and more trees are build per iterations: The first iteration builds 5 trees, the next one 10, the one after it 15, 20, 25 up until the last iteration which builds trees_per_step*num_iters trees. The final number of trees is the sum of trees for each iteration: sum((i+1)*trees_per_step for i in range(num_iters)) = 275.
Since ever iteration is building trees_per_step more trees, the training time keeps increasing from each iteration to each iteration.

Using model.n_estimators += trees_per_step or not does give different results. I added “commulative_trees: bool=False” as a parameter to decide whether I want n_estimators to keep increasing or not. The intuition behind using it might be something like:
Fit 5 trees on the easy eras, calculate the worst performing eras and use more and more trees to fit them since they are ‘harder’ to predict.