Data V5.1 Release - Faith

gb96 · November 5, 2025, 1:55am

The new data files are more than double the size:

v5.1 validation.parquet is 7.3 GB today versus 3.3 GB for v5.0

This will impact models that are memory-constrained (or GPU memory-constrained) during training. If a model trained on all features of v5.0 is getting close to a memory limit when training it is likely to run out of memory if attempting to train on v5.1 unless a subset of features is selected or the number of eras in the training data is reduced.

When new data is added to validation.parquet each week, the only way for participants to fetch the new data is to re-download the entire 7.3 GB file.

Has anyone considered that if the data format was CSV instead of parquet, an HTTP GET feature could allow the client to just download the new rows, saving a lot of time and network bandwidth at the numerai server.

Topic		Replies	Views
Super Massive Data: Sunshine Announcements	24	7924	March 23, 2023
Super Massive Data Release: Deep Dive Data Science	81	21709	November 22, 2021
V5 "Atlas" Data Release Announcements	33	4546	October 6, 2024
New Target for Payouts and Data V5.2 - Faith II Announcements	3	1107	December 22, 2025
Which is the current dataset? Tournament	22	1960	November 9, 2022

Data V5.1 Release - Faith

Related topics