New data and the example predictions

pumplerod · January 5, 2022, 7:59pm

I may well be missing something elementary, but my recollection is that I used to use the “example_predictions” provided by Numerai as a gauge to assist in measuring my model’s relative performance.

Recently I tried to merge the example_predicitons.csv and example_validation_predictions.csv into my train/valid df files as I used to do, however none of the 'id’s for the example predictions exist within the new data set. Is this working as intended? Or have I made some mistake coming back to this after so many months?

My hope is to have a set of predictions, supplied by the Numerai team, which is a reasonable approximation of the meta-model. This would allow me to compare my models to a known entity and hopefully leverage that information while I tune my performance.

Is there a way to go about creating this information? I can train my own model as a base line, but I would like to use something more official.

mic · January 5, 2022, 8:51pm

DId you see this one?

The new data sets contain different eras than before. Numerai publish example predictions parquet files (not csv) for the updated example model on the new data sets.

pumplerod · January 5, 2022, 11:30pm

I see, and am using the new data .parquet files. However, I’m looking for the example_predictions and example_validation_predictions files which I still only see .csv files for. And, unless I’m missing something, these files do not correspond with the indices in the new .parquet files.

mic · January 6, 2022, 12:48am

You can get parquet for those files too.

For example when using numerapi, something like:

napi.download_dataset("example_predictions.parquet", "example_predictions.parquet")

I think they match the new data.

mic · January 6, 2022, 1:40am