Dear Kagglers, V5.2 data are now available on Kaggle platform with weekly automatic update:
- numerai data is public notebook, automatically triggered on Saturday round opening, downloading data from v5.2 Data - Numerai, and also producing 4 smaller subsampled datasets with non-overlapping data.
- numerai latest tournament data is public dataset with output data of producing notebook numerai data. Dataset is updated automatically, when producing notebook is successfully executed.
You can use whichever data source as the input of your notebooks to produce Tournament submissions. Using the new dataset and new target target_ender_20, I have retrained and uploaded all public Kaggle example models:
- Hello Numerai automated - basic tutorial model with improved version trained on medium feature set
- numerai Feature Neutralization - Kaggle tutorial explaining FN
- numerai Target Ensemble - Kaggle tutorial explaining ensembling
- Numerai Example Model Sunshine - example model using both techniques above
Diagnostic data are not suggesting any improvement over v5.1, but hey it’s just backtesting. Let’s see how they will fare next year.
This is diagnostics of model trained on train.parquet with medium feature set and new target of V5.2 data:
and this same model on V5.1 data:

