correlator’s question on Neural nets in RocketChat
On the Numerai data, Tree based models are easier to develop than Neural nets as the latter requires more finetuning. I tried a live model (NN) for a some weeks and then gave up as it was consistently below 0 corr. Assuming that most of the tournament models are tree based models (I am pretty sure they are), it will help Numerai if people build other kinds of models such as Neural nets.
It would help if someone could give pointers to building a first cut NN model which works at par with the example model. @mdo @jrb @surajp
I thought why not answer it on forum as we can discuss on this broadly here.
I think there are a lot of Neural Nets in the tournament.
First thing to consider when trying out Neural nets for modelling is the fact that they are called “Universal function approximators”. You can perform all sorts fancy experiments with it. A sufficiently parameterized model will eventually over-fit on the training data.
- I’d recommend going through this paper, “Understanding deep learning requires rethinking generalization” after every few months
(might give you some ideas as well)
You can approximate (or even distillate) your best of tree based models by a sufficiently parameterized neural net. Which implies that NNs are capable of learning similar patterns as Trees. You just need to find an appropriate architecture for the data.
- @mdo recently answered this in recent OhWA OHwA S03E02 (Slido question 9)
Next step is to incorporate correlated variety (the closest word I could think of) if you are considering ensembling. With that, you now have a whole new pallet of choices. You can ensemble on different architectures, initialization, training on different subsets of eras and what not!
You need a combination of models that can generalize well when combined OR Instead of ensembling, you can learn another model that is uncorrelated to all of these (this also applies to learning an orthogonal model to your best Tree based model) (“Beating the wisdom of the crowds is harder than recognizing faces or driving cars”). You need to give a Boost to your models .
-
I wasn’t a big fan of ensembles of big models in production because of resource constraints but turns out we can reduce the size and inference time by pruning and distillation without sacrificing much of the original model’s performance. Which can somehow reduce overfitting.
-
Deep Ensembles: A Loss Landscape Perspective This paper changed my perspective on ensembling NNs! (You might get some ideas from here too)
The most important thing is choosing a loss function! There is a lot of discussion on loss functions on the forum. This is where NNs shines (pretraining => finetuning). Remember, predictions are scored on correlation. You should develop your own loss function to get better on MMC (that’s your secret sauce)!
- you can also choose from a wide range of optimizers that comes with DL libraries Descending through a Crowded Valley – Benchmarking Deep Learning Optimizers
NOTE:
- NNs have a factor of luck with initialization, so you should develop some kind of quick evaluation framework/functions. That way you can experiment faster.
- I haven’t considered any kind of pre-processing to data.
Instead of a neural only model(s), you can combine a good tree based model (for CORR) with a flexible NN trained on originality of predictions (corr+ MMC).
With this, I have (almost) opened up all of my core ideas around NNs for the tournament. I haven’t done anything new in particular, it’s just accumulation of interesting RocketChat and forum posts. I guess I have previously discussed about some specific things too at both places. and there are some direct references in this posts that simply indicate what my models are!
To answer,
Yes, Its possible to beat example predictions with NNs.
Above points are good enough to get you started with a really good basic model that you can later improve. Also,there is a lot of space for pre and post-processing!
numer.ai/parmars is Neural ensemble from 232
numer.ai/dhi (meaning intelligence/understanding) is a single NN model from 232.
So (here), Neural Nets are (almost) all you need ! Hope this helps
All the best