V4 vs V4.1, Nomi vs Ralph vs Cyrus

I was asked a question in our discord about why the benchmark model trained on Nomi has much higher TC than the model trained on Cyrus. And if this is true, then why do we think Cyrus is the better target?

I think this is an important question and I don’t want the answer to be lost in a discord thread.

To explain what could be going on here I gathered a few summary metrics on 6 different models.

The models are each combination of datasets v4/v4.1 and targets Nomi/Ralph/Cyrus.

I then show the CORR20V2 (the Numerai Corr with Cyrus) for each of these models over the last 1 year, as well as over the last 9 years.

What’s interesting is that over the last year, indeed models trained on Cyrus don’t look better than models trained on Nomi or Ralph, even though Cyrus is the target all of the models are being scored against. This could explain how the current TC reputation could be lower for the v4.1 Cyrus model than the v4 Nomi model.

However if you look over the last 9 years, we see that the v4.1 model trained on Cyrus is the best in Sharpe and Corr, and is 2nd in Maximum Drawdown. This much longer period of great performance is why we believe Cyrus is a better target in general, and why we think models trained on it will have higher TC in the long run.

We really don’t know though, and that’s why we have 36 different targets. It’s almost certainly better to build multiple models on multiple targets and ensemble them in some way, building a model that is robust and will do well in any case.


How do Nomi, Ralph, and Cyrus differ in terms of construction? Is there a risk that Cyrus has been ‘overly tweaked’ to work well on historical data, but may not generalize as well on future data compared to Nomi and Ralph?


which feature set was used to generate these results with >1.5 sharpe?

my best ensemble on medium features is only 1.25.

apologies if answered on discord having trouble accessing.