On what data do I need to send my predictions ('validation', 'test', 'live')?

The tournament data has 3 different types: ‘validation’, ‘test’, ‘live’

When submitting my predictions to a tournament, do I need to use all 3, or can I just send it in one of them (to reduce file size)?


Your predictions should be based on the live data not training data or validation data

I think thats is what you mean I’m not sure I quite understand the question fully now looking back on it


Thank you. What I mean is that the data_type field in the tournament.csv file, has 3 types: ‘validation’, ‘test’, ‘live’.

When I want to submit my predictions I run my model on the tournament data and get the predictions, write a CSV file with it and send it to numerai. Do I need to do it to all the 3 types?

All of them. You are sending them a submission file with the same number of rows (and with the same row ids) that are in the “tournament” data file. Look at the “example_predictions.csv” file. Just like that, except with your predictions instead.



Sorry completely misread, going back to reading rather than posting!! Been a long monday its 10.27pm in uk and thats clearly past my bed time…

I believe you can submit a file with only the “live” data_type tag and still earn payouts from what you stake (not 100% certain on this). But the trade off is that you can’t see predicted performance before Thursday when results for the round are posted. I think NumerAI uses the test/validation data to gauge how well your model might do.

For Numerai signals you can submit only the live data_type, for the classic tournament you need to upload your predictions for the entire tournament file, which contains live, validation and test.

I think NumerAI uses the test/validation data to gauge how well your model might do.

Does this mean that if I set all of the ‘test’ and ‘validation’ predictions as ‘0’ and only predict live data, I get lower score? And I am also curious if I’d better predict validation data or just copy the given target values.

You can do whatever you want with the validation data (you won’t get validation diagnostics that mean anything though, of course). If you are just going to play games with it, you might as well use it as additional training data. But if you don’t predict the test data, you are just being a jerk (although it is true you get no feedback from it). Well, at least if you are staking, as they rely on that info at least some of the time. (And if a lot of people made a habit of that, they’d be forced to make the checks on submissions more draconian. And we don’t want that. Been there, done that.)


No, I believe NumerAI payouts depend only on the live data predictions. If you want to use NumerAI’s built-in model predicted performance measures then you should predict the validation and test data

1 Like

Somebody liked one of my previous posts on this thread today, so I’d just like to point out that this information is outdated, and you no longer need to submit anything but the live era each week. (No more “test”)


Somebody …

Thanks. That’s me.