Numerai Tournament Example code using Pytorch NN and Optuna

meaten12121 · December 17, 2021, 7:56am

Hi,

I dived into this numerai community nine months ago and keep developing my tournament code.
There are many example scripts from this community, but I wanted to share my code base to get feedback from this community.
Hope some newcomers find this helpful.

Features:

era-boosted train, time-series cross-validation
era-batches training
model hyperparameter tuning on pytorch NN and GBDT model
several tips on Numerai Forum are also included

Thanks!

objectscience · December 17, 2021, 3:50pm

Very nice work!.

I’m going to ping the CoE on this to see if it will qualify for a bounty.

I know for me personally, the day I can actually afford a couple of good GPU’s, this is going to be really useful.

aventurine · December 17, 2021, 6:42pm

Very nice. Also love the model name. Might want to try and fix the model link though because its not taking into account the _ and its going to one of my models Numerai and not Numerai

meaten12121 · December 18, 2021, 6:45am

Thanks for your comment!
Bounty? I didn’t know that system. I appreciate it if you ping the CoE!

The example models I added are simple and don’t need good GPUs. (about 1GB GPU memory consumption)
I hope you will get some after this significant GPU shortage.

meaten12121 · December 18, 2021, 6:49am

Sorry for the mistake on url parsing I already fixed it.

bigbertha · December 24, 2021, 8:01am

I use vast.ai to use cloud GPUs. Even with normal GPU prices there is economically no reason to get one ourself. I wrote a forum post on my setup. Please give me a ping if you want some more details.

stoicism · December 24, 2021, 9:52am

Hi mate, many thanks for sharing your hard work with us. Can you please explain what era-boosted training and era-batches training is? It would be booster shot for the newcomers!

bigbertha · December 24, 2021, 9:57am

Era Boosting is

Training on all eras
In sample prediction of all eras
Selecting a subset of all eras to train on based on in sample performance (usually bottom half)
Resume training with only the selected eras

Era Batches apply to neural networks and there the common sense is that it is good to train in batches equivalent to eras. I.e. forward for one era and then backward. and then the next era and so on.

stoicism · December 24, 2021, 10:01am

So instead of treating batch-size as a hyper parameter, the era sizes (number of data points in each era) are assigned as the batch size, is that correct? Thanks for the detailed response @bigbertha !

bigbertha · December 24, 2021, 10:02am

That is correct. I personally use pytorch where you define a DataSet class and it is dead easy to return an era as the next batch. With keras I never got it to work.
And I am sure it is super easy in JAX but despite @jrb pep talk I have never gotten something to run…

stoicism · December 24, 2021, 10:06am

Thanks for the clarification. One less hyperparameter to worry about. Although, I am interested to see if treating batch-size as a hyperparam yield something equivalent to era size or not.

jrb · December 24, 2021, 5:15pm

If you’re using tf.keras, subclassing tf.keras.utils.Sequence is the way to go. The next method I’ll describe is more universal and works with tf.keras, Tensorflow and JAX.

If you’re using Tensorflow without the keras training loop, or even if you are using keras: Start by converting your data into a RaggedTensor first. You can then turn the RaggedTensor into a tf.data.Dataset using the from_tensor_slices method. And then, Bob’s your uncle!

If you’re using JAX, use the as_numpy_iterator method on the tf.data.Dataset object.

One thing to bear in mind is that the JAX jit caches its traces based on the shapes of the tensors passed in. So, if you’re using eras as batches with the jit, your first epoch is going to be super slow because it has to jit your training loop for every era (because they all have different number of rows).

This is the case with Tensorflow AutoGraph as well, but the experimental_relax_shapes argument to the tf.function decorator helps alleviate this issue.

Using JAX without jit and Tensorflow in eager mode are perfectly feasible options. Empirically, I’ve observed that both are a bit faster than PyTorch. Although, I haven’t tried using PyTorch’s jit.

In any case, you can efficiently mix and match Tensorflow, JAX (and even PyTorch, if you really want to) using dlpack tensors. All 3 frameworks support it. Heck, even XGBoost supports it.

jrb · December 24, 2021, 5:17pm

Alternatively, use minibatches, but ensure that each minibatch does not contain data from more than one era.

meaten12121 · April 25, 2022, 8:12am

Hi,

I’ve just updated the above GitHub repository to adopt the changes below:

10x Faster MLP models compared to the initial commit (nn.ModuleList → nn.Sequential)
use numerai v4 dataset

I know there will be benchmark models ( talked at numeracon April 2022) and my example should be obsolete soon but I decided to continue updating for a while for the sake of my learning and who forked my repository

Future plan:

Transformer model to extract new features different from MLP or GBDT
Xfeat(GitHub - pfnet-research/xfeat: Flexible Feature Engineering & Exploration Library using GPUs and Optuna.) to automate feature engineering

Any advice or proposals for future updates are welcome. Thanks!

Topic		Replies	Views
Taking advantage of Eras Data Science	6	3371	June 10, 2021
Era Boosted Models Data Science	21	15208	October 10, 2021
16GB Intermediate solution: XGB Era Boosting Tournament	54	5421	April 1, 2022
Hyperparameters optimization for "small" LGBM models Data Science	7	2314	October 9, 2023
Tournament NN baseline with the new massive data Tournament	3	1392	October 19, 2021

Numerai Tournament Example code using Pytorch NN and Optuna

Related topics