Hi everyone! I’m fairly new to Numerai, and am working on getting my first model up and running. I’d like to try the NN approach, for my own learning, though I know XGB and other methods are more popular. I’m confused about a couple of things in the dataset:
The “ID”. Is that unique to each asset & consistent throughout eras? Or in the obfuscation does each asset get a unique ID every new era? Just curious if those could be used to construct a history of a particular asset and train a time-based RNN or something
The “target” values we are training to are binned to one of 5 values (or 7 values for some targets), i.e. 0, .25, .5, .75, 1. How should I construct the output of my neural network when predicting targets for submission? Should they be:
a. Also binned to match the training set bins? So construct a multi-class classifier that only outputs 0, .25, etc.
b. Or a continuous prediction between 0 and 1. Most of the examples I’ve seen use a Sigmoid activation as the final layer to achieve this.
I’ve noticed my starter models VERY quickly overfit compared to validation. I’m not subsampling the eras as of yet, so is this likely the main cause of the overfitting? Any good strategies for tackling this? Is shuffling the data in my DataLoader a good first step? Or do I need to subsample? Do you use all 4 eras every epoch? Or cycle eras through epochs? Or throw out 3/4 of the data altogether?
Thanks for any tips you’re willing to offer!