New Target for Payouts and Data V5.2 - Faith II

Today we are announcing a new dataset which is ready immediately, as well as a scoring change for 2026.

New Dataset

The second Faith dataset contains 186 new features. Some are completely new types of features. Many are similar to the original Faith features. Those were our most powerful features ever released, and we have found it helpful to include more variations of those.

Faith II also contains two new targets: Ender, and Jasper, both 20D and 60D variants. These are similar to Teager targets, which you will already be familiar with. You will find that models trained on these targets exhibit more consistency than models trained on other targets.

Of course there are new benchmark models for these as well, for example v52_lgbm_ender20.

New Target for Payouts and Leaderboard

Starting with the round starting January 1, 2026, predictions will be scored based on their correlation with the new Ender20 target. This applies to both CORR and MMC.

This is our first payout target change since we introduced Cyrus in April 2023, and we don’t make the change lightly. We spent many months developing a target which encourages users to make models that are maximally valuable to our business, and Ender is the result of that work.

These are the last changes we have in the queue at the moment.

So spin up your Blackwells and build your best models for the 2026 season.

Happy Modeling

3 Likes

My Blackwell started to spin

1 Like

Dear Kagglers, V5.2 data are now available on Kaggle platform with weekly automatic update:

  • numerai data is public notebook, automatically triggered on Saturday round opening, downloading data from v5.2 Data - Numerai, and also producing 4 smaller subsampled datasets with non-overlapping data.
  • numerai latest tournament data is public dataset with output data of producing notebook numerai data. Dataset is updated automatically, when producing notebook is successfully executed.

You can use whichever data source as the input of your notebooks to produce Tournament submissions. Using the new dataset and new target target_ender_20, I have retrained and uploaded all public Kaggle example models:

Diagnostic data are not suggesting any improvement over v5.1, but hey it’s just backtesting. Let’s see how they will fare next year.

This is diagnostics of model trained on train.parquet with medium feature set and new target of V5.2 data:

and this same model on V5.1 data:

1 Like

Looks like target_ender und target_jasper have 2 x NaN each in train - valid looks Ok to me - maybe you wanna double check