Benchmark Models

Numerai develops new datasets and new targets to help our data science community build better models. Numerai builds models on each new target and data release. Today, we will begin giving out the predictions for all of these models, and details about how they are created.


New User Acceleration

Numerai has a steep learning curve. After you make it through the tutorial notebooks, you are left with several datasets, many targets, and many modeling options. There are an unlimited number of experiments you’ll want to run as you begin your journey to the top of the leaderboard. With benchmark models, you can immediately see how well different combinations of data and targets do. I think you’ll find that exploring these models and their predictions and subsequent performance will inspire even more ideas for new models you can build yourself.

Better Stake Allocation

If you’re a returning user and you’re a few updates behind, you can see at a glance if your model is still competitive, or if you’d be better off staking on one of the newer benchmark models until you have time to catch back up.

A Meta Model of Meta Models

Some users may not have the resources to train large competitive cutting-edge models themselves. However, by just downloading targets, the Meta Model predictions, and Benchmark Model predictions, it may still be possible to recognize that the Meta Model is underweight some types of models, or you might be able to find that certain targets ensemble especially well together, or you might have a strong belief that one target will outperform into the future. You can explore all of these possibilities yourself and even submit and stake on these ensembles with minimal resource requirements.


Go to to see a list of models and their recent performance.
Go to the docs to see more details about how they are made.

The validation and live predictions are available through the api.

pip install numerapi

from numerapi import NumerAPI
napi = NumerAPI()
napi.download_dataset("v4.2/validation_benchmark_models.parquet", "validation_benchmark_models.parquet")
napi.download_dataset("v4.2/live_benchmark_models.parquet", "live_benchmark_models.parquet")

There is now a dotted line on your account page’s score charts to directly compare yourself with the benchmark models account.

Happy Modeling


Thanks for the hard work, @master_key and Numerai team.

Would it be possible to add a toggle switch (next to “Cumulative” in “…”) to compare user/model performance to MetaModel instead of example models? As a not-very-new user, I’d be interested to see how I perform versus the competition and whether I underperform/contribute to the fund.

1 Like

Thank you @master_key very helpful. Is v42_example_preds a rename of the former 20k tree lg_lgbm_v42_cyrus20? The docs say v42_example_preds is a standard model (I assume that means 2k trees), looking to understand for benchmark continuity.

Yeah that’s correct it’s a rename. And all of the benchmark models (still) have 20,000 trees.

This is a bold move, I like it.

@master_key Can you specify what the rank_keep_ties_keep_na function does in the rank_gauss_pow1 function? I’ll be better put it in the Documentation.

I found a similar funtion in the numerai-tools repo, is this what you are using?

1 Like

Would be nice to have a functioning example :v:


I’m still confused. What do you exactly mean by:


All of the ensembles use the following steps:

  1. gaussianize each of the predictions on a per-era basis
  2. standardize to standard deviation 1
  3. dot-product the predictions with a weights vector representing the desired weight on each model
  4. gaussianize the resulting predictions vector, and neutralize if there are any features to neutralize to

It would be super helpful if you could provide an example :nerd_face: and share underlying code pls.

But they did provide the code.

    def gauss_pred(self, X: pd.DataFrame, ensemble_cols, weight_vector):
        for col in X[ensemble_cols]:
            if "era" in X.columns:
                X[col] = X.groupby("era", group_keys=False)[col].transform(
                    lambda s1: rank_gauss_pow1(s1)
                # check X contains only a single era
                assert 1800 < X.shape[0] < 6000
                X[col] = rank_gauss_pow1(X[col])
        return X[ensemble_cols].dot(weight_vector)

as for the rank_keep_ties_keep_na method, I imagine it is something like this.

for keeping ties use method average and instead of len(s.dropna()) do just len(s) or s.count() so in essence looks something like this

def rank_gauss_pow1(s: pd.Series) -> pd.Series:
    # do rank-normalize

    # s_rank = rank_keep_ties_keep_na(s)
    # s_rank = (s.rank(method="average") - 0.5) / len(s.dropna())
    s_rank = (s.rank(method="average") - 0.5) / s.count()
    # gaussianize
    s_rank_norm = pd.Series(scipy.stats.norm.ppf(s_rank), index=s_rank.index)

    # Standardize to 1 std
    result_series = s_rank_norm / s_rank_norm.std()

    return result_series