Relationship of daily round correlations to final round correlations

Could maybe be just because they are predictions, and we are looking at models that did fairly well (although NO models ever really get high correlations). Be interesting to check on random models.

But please let’s quit with the idea that the team is hiding something or there is big conspiracy. I ASKED THEM THESE EXACT QUESTIONS and Richard directly answered them (and you can go watch it). There has been no evasion on this topic whatsoever. It has always been stated that we are predicting the market 4 weeks hence but since this question gets talked about sometimes I just wanted to clarify with the team that nothing that happens in the intermediate days could affect the score. And he did confirm that. So the question is pretty much settled – we are predicting 4 weeks hence exactly as they always said. The only confusion brought into this is really by users that wondered if that was strictly true. Turns out it is. They’ve been 100% open about it – nobody ever directly asked them before and when I did, Richard answered me.

I mean, I guess he could be straight up lying, or he doesn’t actually know how it works, or there is a bug and our scores have been wrong since forever – all technically in the realm of possibility I suppose. But come on…


Since we are predicting what the market will look like only on day 20, it could stand to reason that, as the round progresses, each day becomes more similar to day 20. I’d expect the difference between day 1 to day 2 to have a lot of noise because they are both the least like day 20 and maybe in different ways. The difference between day 18 and day 19 should have less noise because they are both most like day 20. “There are many ways to be different, but only one way to be the same” might apply? All of this only figures “on average and over time” as of course there could be large shocks later on in rounds on some occasions.

If we didn’t have daily scores, how would we become addicted to refreshing the leaderboard and profile pages? It’s just gamification. Cooler heads should pay little attention to it in the long run, except if we discover some information can be gleaned from a model’s intraround volatility: Sharpe and Sortino ratios on live performance of your models (@degerhan’s post still hasn’t gotten enough attention).

1 Like

Consider also that day 1 is the closest to the data as we get it at the start of the round, and that might give you hint of a way you could actually use the daily scores to improve your models – although we aren’t given the targets at any point, knowing how much movement there actually is in an average round is potentially actionable information.

@profricecake If this is a concern, you should not be participating in the tournament. In contrast to the very long conversation in Rocket Chat yesterday (and many times before) about trusting third party model staking where there are big trust issues between users, this is about trusting the system.

1 Like

That makes a lot of senses. But for the sake of transparency, revealing the score calculation method should not be a red-line vis-à-vis the hedge fund operation. Transparency is heart of the defi, right :grinning:?

We don’t even know what the target is. So that could have a lot to do with it.

I did look at the data for ALL models and then filtered in a few ways but the basic pattern always holds. First day average change is much bigger than others, and then generally decreases. I think a lot of us have noticed that the first day is way off, and will often reverse the second or third day before it gets even a semblance of a trajectory towards where it is going to end up. Probably down again to how the target is created. We have once in a while we’ve seen some radical changes in scores on the very last day or two though…

I’m not reading into this discussion that anyone is questioning the integrity of the team. And I very much appreciate your insight and discovery @wigglemuse however, while you mention that the issue is pretty much settled, I cannot find any information in the official documentation regarding details on the subject. I trust that you did ask and that Richard did respond, but it seems reasonable to want these details in a readily available point of reference as part of the official documentation, rather than to have to scour the forums and chat rooms. Especially as a newcomer.

What @profricecake has proposed seems like an utterly reasonable possibility for a method of calculation. It may not, in fact, be what is happening but I, for one, would love to see clear easily accessible documentation from the official numerai team, which doesn’t seem like a big ask. If presenting that level of detail would pose some level of data leakage or security risk, then stating so would be sufficient. It’s in all of our interests for each of us to have great performance, so I certainly don’t think anything sneaky or untoward is going on. Probably just a squeeze on resources. I love the tournament, and the community. I believe a push toward more clear and available official documentation gives all of us some bedrock to stand on and would allow for our guessing to focus more on how we can improve predictions and less on the mechanics of the tournament.

My point is that they’ve only ever said it is one way, so there isn’t really any more detail needed in order for it to be the plain truth. The need to clarify it only comes from users essentially asking “Is the way you say it is actually the way it is?” They obviously aren’t going to document every esoteric detail that it is NOT. Things get documented above and beyond the basics when they are continuing questions about them, and that applies here. But the questions come before the documentation because otherwise it is not known which things are points of confusion that need additional clarification. In any case, I am working on that exact documentation, and should be doing that right now instead of posting on the forum. So once that project is more or less complete (complete enough to post anyway), if and when any more questions about esoteric details come up that I haven’t answered, I will be then happy to add to it (with a quick response time) pretty much any reasonable question people have and then it will be there to point to. And in fact, that’s exactly why I made sure to clarify this once and for all directly with the team, because anything I put in those docs I will make sure is correct.

So the entire point of my documentation project is specifically so users (new users especially) do not have to scour all of these sources for all these questions they have, and it will be in one place. We started the project about a month too late, because just as we got this flood of Lex Fridman podcast newbies flooding the place with their questions was about the same day I started working on that. I wish it was ready already, but it will be quite soon.

Even Richard thinks it should say it better on the website. Here’s the bit of the video I was referring to from last fireside chat. (51:06)

Guess it won’t jump to the right place when embedded, if you open up on youtube and expand the “show more” you can see all the questions and that one at 51:06

Hi all.

I came to this thread in search of explanations. I’m not making accusations, I’m offering theories and challenging those that are either unsupported by evidence or that don’t seem to jibe with my readings of the data (or both).

Please try to remember that you Numerai vetrans were here once too. Not everyone participating in the tournament has been exposed to the same information about the tournament. Apparently some of you have been in direct contact with members of the staff, others like me have just started their involvement and know only what’s on the website.

First of all: I’d love to watch it! It sounds like it has all the answers. But you still haven’t shared it yet.

Second: Hiding things is a core part of the Numerai concept. They strip all identifying materials from the data set to keep the competition more data science and less finance. I understand and accept this. They’ve documented it clearly on the website. But the daily scores are the opposite of hidden: they’re front and center for all the world to see, but they are only thinly documented (hence this thread).

Third: Conspiracy? Why did you bring that word up? Just because I’m not ready to accept your claims without evidence? Again, please try and remember that I haven’t had the same access to the primary sources that you’ve had and - like you - just want to hear answers from a source I can trust. Just because you feel that you understand things with great certainty doesn’t automatically make you a trustworthy source, since you are not a Numerai employee nor have (yet) offered any evidence beyond hearsay in support of your claims.

If you post that video that you keep referencing, I would greatly appreciate it. Then it will be available for anyone who has the same questions in the future and who want to hear it straight from Richard’s mouth. This is less ideal than an update to the official Numerai docs but still a big step in the right direction.

Looking ahead, I hope you plan to cite (and make available) your sources in your upcoming FAQ. This is historically how knowledge is built, and for good reason.

Thanks for sharing this. I didn’t know that there was a complete lack of feedback in an earlier incarnation of the tournament. Daily feedback would certainly be an improvement on that, for sure, even if the early days are not indicative of the final score.

This is pretty funny. The thing I’ve asked for most in my posts is a trusted source of information (like Richard) to chime in on this topic. Please don’t misinterpret my skepticism of unfounded claims made by Numerai outsiders as a lack of trust in Numerai’s leadership. There is no connection between the two, which is of course the source of my skepticism in the first place.

Aha! I see the video was posted while I was drafting my last message. I’m looking forward to watching it, @wigglemuse. Thank you for sharing it.

My FAQ may have a few links in it to existing materials, but it certainly isn’t going to source and footnote every little thing as that would be almost impossible. But no implementation details I describe will be in there that I’m not sure about or haven’t verified. And it is all public info – no insider access required other than occasionally asking an insider about something (which they are only going to answer if it is ok to be public). But feel free to ask about any additional details, or if something remains confusing, that’s the whole point.

The video answered some questions unambiguously for me so I thought I’d share what I’ve learned here so others might benefit.

  • Although daily scores are provided, Richard confirmed that “ultimately you get scored on the last day [of each round] only”, and “there’s nothing in the middle that could affect your performance.” Although he did go on to acknowledge that companies going bankrupt or some other large-scale market disruption could certainly change performance during a tournament round.

  • The daily scores, as Richard describes them in the video, are “just an estimate” of what your final daily returns will be if Numerai was forced to give you an estimate even many days out from the actual tournament end.

These statements resonate with the observations from this thread that it’s easier to make a prediction when you’re closer to the actual scoring day, and hence why the daily scores seem to converge towards the final score (because, as many have noted, the market state on day 20 is likely to be more similar to the market state on day 19 than on day 1).

Based on this information, I’m confident that the idea I posted about daily scores being daily correlations is off-base. Here’s to learning!

In the wake of this informative thread, I would like to make three simple suggestions to Numerai to disambiguate this daily score stuff for others in the future.

  1. Call them “daily estimates of your final score” instead of “daily scores” (or something similar that would highlight the fact that they’re estimates)
  2. Add error bars to the website graphs based on what I’m sure is plenty of available data on the variance of the estimated scores relative to the final score. Bars would vanish for all completed rounds, of course, but the IP rounds would each have them, and they would grow progressively larger as we approach the latest round.
  3. Revise some of the documentation under Scoring on this page.

In an attempt to be helpful regarding #3, below are some suggested revisions to the existing docs.

Here’s a current (confusing) paragraph:

Each submission will be scored over the ~4 week duration of the round. Submissions will receive its first score starting on the Thursday after the Monday deadline and final score on Wednesday 4 weeks later for a total of 20 scores.

Here is a clearer and more informative version based on what I gleaned from the video:

Each submission will be scored on the final day of the ~4 week round. Submissions will receive estimates of their final score starting on the Thursday after the Monday deadline that will continue until Wednesday 4 weeks later when the final score will be released. These estimates will be provided on what we call “scoring days” (weekdays M-F minus market holidays). The estimates tend to grow more accurate as predictors of the final score as the tournament round draws to a close, but they are merely estimates. Only the score on the final day counts for the competition.

While I’m at it, I’ll offer a revision to the next paragraph too. Original:

Since a round takes ~4 weeks to resolve, if you submit new predictions every week, you will receive multiple (up to 4) overlapping scores on each scoring day from the 4 ongoing rounds.

Proposed revision:

Since a round takes ~4 weeks to resolve, if you submit new predictions every week, you will receive multiple (up to 4) overlapping score estimates on each scoring day from the 4 ongoing rounds.

Thanks to all who offered their input on this one!


Maybe I’m just beating a dead horse by bumping this… But here’s my take, code below.

Daily score is nothing more than the correlation between prediction vs realized target on that day. The realized target is some form of cumulative return until that point (might be market neutralized, scaled by risk etc).

So it can be fairly well modelled by 5000 (or however many stocks are traded) brownian motions. Each day these are ordered and that order is compared to your ordering.

A change in daily score is not triggered directly by returns of stocks, but by the change in order caused by the change in cumulative returns. As @wigglemuse points out the score will per defintion converge to the final score.

Now to the fun part… The reason why the absolute changes in increments (or std for us preferring l2 norms) decrease as the round progresses is because the average distance between the 1d diffusion processes increase. Higher distance = lower chance of change in the rank correlation between prediction and target.

The standard deviation of increments in daily scores goes down in magnitude by approximately 1/sqrt(day of round + 1).

Code to a simple monte carlo simulation of this: