Some Simple Tournament Questions

I’m newish to this, and I would just like to make sure I’m doing it right.

On the data:
There’s training, validation, test, and live data types. The test data and the live data do not have valid target values. What is the test data for? Just to make sure my programs run?

On the tournament, if I am not staking my prediction, can I submit a prediction on the current live data after the the cutoff day, just to see how it runs over the remainder of the cycle?

If it makes any difference, I am running my programs at home from downloaded data, and not on the services that most seem to be using.


Ps. Pointers to appropriate FAQs would be fine.


Gonna take these questions in reverse order.

It doesn’t matter how you compute or how you submit (automated, api, direct upload). You are delivering a prediction file to Numerai one way or another, doesn’t matter how.

Even though there are 4 overlapping “open” rounds at any given time, only the most recent is eligible for submissions. So new data comes out each Saturday for a new round. For that submission to “count” – be stake eligible and to count towards your “rep” score, predictions must be submitted for that round before the Monday morning deadline. However, if you miss that deadline, you can still submit for that round up until the next round data comes out but it will be “late” (so you can submit monday through early saturday morning). But you can’t submit two weeks into a round for that round.

Important but confusing note: although you can overwrite your submission with a new upload during the submission window, the “before the deadline” and “after the deadline” windows are a bit different. If you submit something before the deadline (meaning on the weekend when the data comes out) so that it is an “on time” submission, you can’t replace that after monday morning (because it “counts”). However, if you wait until that deadline has passed so your submission is “late”, then you can replace it as many times as you want before the next round starts – some people do this just to see the diagnostics when they upload. So “late” submissions are possibly actually more useful for newbies if you think you want to replace your model while you are tinkering. You can replace an “on time” submission though while you are still in the “on time” window.

What is the test data for? Numerai internal backtesting and validation of your model. They need predictions for which they have the targets but you don’t (so you can’t overfit to it, it is clean).


Great @wigglemuse, very enlightening!

I had doubts about sending the predictions before and ended up adapting ways to analyze the predictions through Colab with help from the community here.

I am changing some things and soon I will make the link available in case anyone is interested.

1 Like

Thanks, @wigglemuse. I think my biggest problem is that I spent most of my career in very rigorously documented environment, so this is a bit of a change. I guess this old dog is going to learn some new tricks :dog:

Re. this bit:

So I need to submit results to the tournament for the test data as well as the live data? I’m cool with that.

You need to submit predictions for everything in the “tournament” file, this includes test, validation and live.


Thank you @themicon, that cleared up a whole lot of other questions.

@wigglemuse @themicon

Hi, I have two questions related to this topic:

  1. When I make a submission, is model diagnostic feedback calculated on validation or test data_type (or maybe both)?

  2. There was a talk in Validation 2 Announcement about validation1 and validation2 subsets. My question is are they 121-132 and 197-212 era clusters in the validation set?


1 Like

Validation only, yes. So you could even compute all those yourself because you have the targets.

And yes, all the validation eras are included in the diagnostics.


Another simple question…
Are the predictions for the Validation era Ids that are included in the example_predictions.csv file samples of the measured values corresponding to those Ids, or the output of Numerai’s prediction model?

I don’t understand the first option, but the answer is the second option – it is just the model’s output for the whole tournament file including validation eras. (i.e. just like it would be if it were your model – except they truncate the decimal places which you shouldn’t do)

I just got curious about what mapping Numerai uses to convert whatever actual measured values they have into the target values included in the training and tournament sets. I suspect that they take whatever the measured values are, they pass it through a normalization filter that translates the data into a distribution, such as a normal distribution, and then output that result fitted between 0 and 1. For example

I got curious because if you take the example predictions file, all of the results fall within [0.4217, 0.5652]. range. Which means, of course, if they simply use rounding to get bins like [0,0.25,0.5,0.75,1.0], all their results would fall into one bin (around 0.5).

FWIW, here’s a copy of the top of the example_predictions.csv file:

and here’s what a histogram of the whole file looks like:

Now in my own predictions, I’m using a mixed regression/classification approach; below is a typical histogram of results based on the tournament data. (I’m taking that approach because it’s worked well for me in the past, it’s usually good at picking up outliers. OTOH it’s sort of a this dog is resistant to learning new tricks issue as well :slightly_smiling_face: )

This one has an output range of [0.0278 ,0.9731], and obviously it looks somewhat more skewed and significantly more leptokurtic than the Numerai predictions (just eyeballing here, I haven’t calculated the numbers yet).

The reason I’m interested is that, as I go along, it might be worthwhile to determine a separate mapping from my training outputs to the training targets. Sort of a final stage calibration, so to speak.

Sorry if I go on too much.

Just remember you are scored on rank, so only the ordering of your predictions matters to how you score, not what the raw values are or the shape of their distribution.

1 Like

Thanks for the reminder, @wigglemuse. But right now I’m just more interested in understanding the problem than in the actual scoring.