Introduction
We at Numerai have spent the last couple of weeks conducting a new grid search on the Sunshine V4.1 data set. We have built hundreds of models with different hyperparameters on the V4.1 dataset focussed on target_cyrus_v4_20 which we believe is the best single target at present for our hedge fund strategy.
We are sharing the best of these grid searched results in terms of correlation and correlation Sharpe to enable users to benefit from the grid search and either use these results directly in their models or allow them to do more targeted searches of their own around the sweet spots.
Experimental Setup
We did the grid search using the following parameters:-
Features = all the features in the V4.1 dataset
Target = target_cyrus_v4_20
Algorithm = Scikit-learn API LGBMRegressor
Hyperparameter ranges for the better results are shared below:-
n_estimators = 30k - 60k
learning_rate = 0.001
max_depth = 5, 6, 7
num_leaves = 2**max_depth - 1
colsample_bytree = 0.1
Results
The first plot below contains the correlation and the second the correlation Sharpe computed using the out of sample predictions from era 578 to 1059 inclusive. The training period was from era1 to era 574 inclusive.
Table 1 below contains the 20 best correlation results.
Table1
Table 2 below contains the 20 best correlation Sharpe results.
Table 2
**The plot below shows the cumulative correlation of
-
The sunshine recommended param model with learning_rate = 0.001, n_estimators = 20K and
max_depth = 6. -
The best correlation model from the above table.
-
The 2 best correlation Sharpe models from the above table.
Alternate hyperparameters for less compute
The above parameters require 6 hours to compute for a tree of max_depth = 6 , a learning rate of 0.001 and n_estimators = 100k on a 24 core processor. This may be a heavy compute burden for users.
We show below parameters that work with lower compute using a learning rate of 0.01, n_estimators = 20k and columnsample_bytree = 0.1. This reduces the compute time to less than 2 hours.
Conclusion
The above results are very competitive in correlation Sharpe space while slightly worse in terms of pure correlation but with significant compute saving.