Relationship of daily round correlations to final round correlations

Could maybe be just because they are predictions, and we are looking at models that did fairly well (although NO models ever really get high correlations). Be interesting to check on random models.

But please let’s quit with the idea that the team is hiding something or there is big conspiracy. I ASKED THEM THESE EXACT QUESTIONS and Richard directly answered them (and you can go watch it). There has been no evasion on this topic whatsoever. It has always been stated that we are predicting the market 4 weeks hence but since this question gets talked about sometimes I just wanted to clarify with the team that nothing that happens in the intermediate days could affect the score. And he did confirm that. So the question is pretty much settled – we are predicting 4 weeks hence exactly as they always said. The only confusion brought into this is really by users that wondered if that was strictly true. Turns out it is. They’ve been 100% open about it – nobody ever directly asked them before and when I did, Richard answered me.

I mean, I guess he could be straight up lying, or he doesn’t actually know how it works, or there is a bug and our scores have been wrong since forever – all technically in the realm of possibility I suppose. But come on…

2 Likes

Since we are predicting what the market will look like only on day 20, it could stand to reason that, as the round progresses, each day becomes more similar to day 20. I’d expect the difference between day 1 to day 2 to have a lot of noise because they are both the least like day 20 and maybe in different ways. The difference between day 18 and day 19 should have less noise because they are both most like day 20. “There are many ways to be different, but only one way to be the same” might apply? All of this only figures “on average and over time” as of course there could be large shocks later on in rounds on some occasions.

If we didn’t have daily scores, how would we become addicted to refreshing the leaderboard and profile pages? It’s just gamification. Cooler heads should pay little attention to it in the long run, except if we discover some information can be gleaned from a model’s intraround volatility: Sharpe and Sortino ratios on live performance of your models (@degerhan’s post still hasn’t gotten enough attention).

4 Likes

Consider also that day 1 is the closest to the data as we get it at the start of the round, and that might give you hint of a way you could actually use the daily scores to improve your models – although we aren’t given the targets at any point, knowing how much movement there actually is in an average round is potentially actionable information.

@profricecake If this is a concern, you should not be participating in the tournament. In contrast to the very long conversation in Rocket Chat yesterday (and many times before) about trusting third party model staking where there are big trust issues between users, this is about trusting the system.

1 Like

That makes a lot of senses. But for the sake of transparency, revealing the score calculation method should not be a red-line vis-à-vis the hedge fund operation. Transparency is heart of the defi, right :grinning:?

We don’t even know what the target is. So that could have a lot to do with it.

I did look at the data for ALL models and then filtered in a few ways but the basic pattern always holds. First day average change is much bigger than others, and then generally decreases. I think a lot of us have noticed that the first day is way off, and will often reverse the second or third day before it gets even a semblance of a trajectory towards where it is going to end up. Probably down again to how the target is created. We have once in a while we’ve seen some radical changes in scores on the very last day or two though…

I’m not reading into this discussion that anyone is questioning the integrity of the team. And I very much appreciate your insight and discovery @wigglemuse however, while you mention that the issue is pretty much settled, I cannot find any information in the official documentation regarding details on the subject. I trust that you did ask and that Richard did respond, but it seems reasonable to want these details in a readily available point of reference as part of the official documentation, rather than to have to scour the forums and chat rooms. Especially as a newcomer.

What @profricecake has proposed seems like an utterly reasonable possibility for a method of calculation. It may not, in fact, be what is happening but I, for one, would love to see clear easily accessible documentation from the official numerai team, which doesn’t seem like a big ask. If presenting that level of detail would pose some level of data leakage or security risk, then stating so would be sufficient. It’s in all of our interests for each of us to have great performance, so I certainly don’t think anything sneaky or untoward is going on. Probably just a squeeze on resources. I love the tournament, and the community. I believe a push toward more clear and available official documentation gives all of us some bedrock to stand on and would allow for our guessing to focus more on how we can improve predictions and less on the mechanics of the tournament.

My point is that they’ve only ever said it is one way, so there isn’t really any more detail needed in order for it to be the plain truth. The need to clarify it only comes from users essentially asking “Is the way you say it is actually the way it is?” They obviously aren’t going to document every esoteric detail that it is NOT. Things get documented above and beyond the basics when they are continuing questions about them, and that applies here. But the questions come before the documentation because otherwise it is not known which things are points of confusion that need additional clarification. In any case, I am working on that exact documentation, and should be doing that right now instead of posting on the forum. So once that project is more or less complete (complete enough to post anyway), if and when any more questions about esoteric details come up that I haven’t answered, I will be then happy to add to it (with a quick response time) pretty much any reasonable question people have and then it will be there to point to. And in fact, that’s exactly why I made sure to clarify this once and for all directly with the team, because anything I put in those docs I will make sure is correct.

So the entire point of my documentation project is specifically so users (new users especially) do not have to scour all of these sources for all these questions they have, and it will be in one place. We started the project about a month too late, because just as we got this flood of Lex Fridman podcast newbies flooding the place with their questions was about the same day I started working on that. I wish it was ready already, but it will be quite soon.

Even Richard thinks it should say it better on the website. Here’s the bit of the video I was referring to from last fireside chat. (51:06)

Guess it won’t jump to the right place when embedded, if you open up on youtube and expand the “show more” you can see all the questions and that one at 51:06

Hi all.

I came to this thread in search of explanations. I’m not making accusations, I’m offering theories and challenging those that are either unsupported by evidence or that don’t seem to jibe with my readings of the data (or both).

Please try to remember that you Numerai vetrans were here once too. Not everyone participating in the tournament has been exposed to the same information about the tournament. Apparently some of you have been in direct contact with members of the staff, others like me have just started their involvement and know only what’s on the website.

First of all: I’d love to watch it! It sounds like it has all the answers. But you still haven’t shared it yet.

Second: Hiding things is a core part of the Numerai concept. They strip all identifying materials from the data set to keep the competition more data science and less finance. I understand and accept this. They’ve documented it clearly on the website. But the daily scores are the opposite of hidden: they’re front and center for all the world to see, but they are only thinly documented (hence this thread).

Third: Conspiracy? Why did you bring that word up? Just because I’m not ready to accept your claims without evidence? Again, please try and remember that I haven’t had the same access to the primary sources that you’ve had and - like you - just want to hear answers from a source I can trust. Just because you feel that you understand things with great certainty doesn’t automatically make you a trustworthy source, since you are not a Numerai employee nor have (yet) offered any evidence beyond hearsay in support of your claims.

If you post that video that you keep referencing, I would greatly appreciate it. Then it will be available for anyone who has the same questions in the future and who want to hear it straight from Richard’s mouth. This is less ideal than an update to the official Numerai docs but still a big step in the right direction.

Looking ahead, I hope you plan to cite (and make available) your sources in your upcoming FAQ. This is historically how knowledge is built, and for good reason.

Thanks for sharing this. I didn’t know that there was a complete lack of feedback in an earlier incarnation of the tournament. Daily feedback would certainly be an improvement on that, for sure, even if the early days are not indicative of the final score.

This is pretty funny. The thing I’ve asked for most in my posts is a trusted source of information (like Richard) to chime in on this topic. Please don’t misinterpret my skepticism of unfounded claims made by Numerai outsiders as a lack of trust in Numerai’s leadership. There is no connection between the two, which is of course the source of my skepticism in the first place.

Aha! I see the video was posted while I was drafting my last message. I’m looking forward to watching it, @wigglemuse. Thank you for sharing it.

My FAQ may have a few links in it to existing materials, but it certainly isn’t going to source and footnote every little thing as that would be almost impossible. But no implementation details I describe will be in there that I’m not sure about or haven’t verified. And it is all public info – no insider access required other than occasionally asking an insider about something (which they are only going to answer if it is ok to be public). But feel free to ask about any additional details, or if something remains confusing, that’s the whole point.

The video answered some questions unambiguously for me so I thought I’d share what I’ve learned here so others might benefit.

  • Although daily scores are provided, Richard confirmed that “ultimately you get scored on the last day [of each round] only”, and “there’s nothing in the middle that could affect your performance.” Although he did go on to acknowledge that companies going bankrupt or some other large-scale market disruption could certainly change performance during a tournament round.

  • The daily scores, as Richard describes them in the video, are “just an estimate” of what your final daily returns will be if Numerai was forced to give you an estimate even many days out from the actual tournament end.

These statements resonate with the observations from this thread that it’s easier to make a prediction when you’re closer to the actual scoring day, and hence why the daily scores seem to converge towards the final score (because, as many have noted, the market state on day 20 is likely to be more similar to the market state on day 19 than on day 1).

Based on this information, I’m confident that the idea I posted about daily scores being daily correlations is off-base. Here’s to learning!

In the wake of this informative thread, I would like to make three simple suggestions to Numerai to disambiguate this daily score stuff for others in the future.

  1. Call them “daily estimates of your final score” instead of “daily scores” (or something similar that would highlight the fact that they’re estimates)
  2. Add error bars to the website graphs based on what I’m sure is plenty of available data on the variance of the estimated scores relative to the final score. Bars would vanish for all completed rounds, of course, but the IP rounds would each have them, and they would grow progressively larger as we approach the latest round.
  3. Revise some of the documentation under Scoring on this page.

In an attempt to be helpful regarding #3, below are some suggested revisions to the existing docs.

Here’s a current (confusing) paragraph:

Each submission will be scored over the ~4 week duration of the round. Submissions will receive its first score starting on the Thursday after the Monday deadline and final score on Wednesday 4 weeks later for a total of 20 scores.

Here is a clearer and more informative version based on what I gleaned from the video:

Each submission will be scored on the final day of the ~4 week round. Submissions will receive estimates of their final score starting on the Thursday after the Monday deadline that will continue until Wednesday 4 weeks later when the final score will be released. These estimates will be provided on what we call “scoring days” (weekdays M-F minus market holidays). The estimates tend to grow more accurate as predictors of the final score as the tournament round draws to a close, but they are merely estimates. Only the score on the final day counts for the competition.

While I’m at it, I’ll offer a revision to the next paragraph too. Original:

Since a round takes ~4 weeks to resolve, if you submit new predictions every week, you will receive multiple (up to 4) overlapping scores on each scoring day from the 4 ongoing rounds.

Proposed revision:

Since a round takes ~4 weeks to resolve, if you submit new predictions every week, you will receive multiple (up to 4) overlapping score estimates on each scoring day from the 4 ongoing rounds.

Thanks to all who offered their input on this one!

10 Likes

Maybe I’m just beating a dead horse by bumping this… But here’s my take, code below.

Daily score is nothing more than the correlation between prediction vs realized target on that day. The realized target is some form of cumulative return until that point (might be market neutralized, scaled by risk etc).

So it can be fairly well modelled by 5000 (or however many stocks are traded) brownian motions. Each day these are ordered and that order is compared to your ordering.

A change in daily score is not triggered directly by returns of stocks, but by the change in order caused by the change in cumulative returns. As @wigglemuse points out the score will per defintion converge to the final score.

Now to the fun part… The reason why the absolute changes in increments (or std for us preferring l2 norms) decrease as the round progresses is because the average distance between the 1d diffusion processes increase. Higher distance = lower chance of change in the rank correlation between prediction and target.

The standard deviation of increments in daily scores goes down in magnitude by approximately 1/sqrt(day of round + 1).

Code to a simple monte carlo simulation of this:

7 Likes

Thanks @jrai this is really useful. :ok_hand:

Despite how little we know about early daily scores, our obsession with them will forever endure. Here’s a quick update on this post including Signals data and some better code in a colab notebook so you can test your models, do comparisons, and more/better analysis: Google Colab

The high level conclusions are still the same:

  • you can expect roughly .02 - .03 correlation difference from the first day of a round’s score and the final resolved day of a round’s score (aka your actual score which is the only one that matters) on average but could range much higher than that.
  • scores are only somewhat informative around the 15th day into a round

The figures show a single model and a different color transparent line for each round with the daily scores distance from the resolved score for that individual round as the round progresses. Then we chart an average line (in red) and a band of +/1 standard deviation across all rounds. First Signals:


And then Classic Tournament:

For these two models across the tournaments and the same rounds 279-292, Signals and Classic daily distances are similar on average, but different rounds exhibited very different daily distances between the two tournaments (which is probably a good thing if you want to diversify risk by competing in both).

Comparing across some top Classic leaderboard positions, and pulling in more rounds so we can get a clearer picture of the mean (20d corr is only available for Signals starting at round 279), we can also see some more volatility (i.e. higher early daily score distances from final score), which may be a common factor at the top of the leaderboard?

We can see the same looking at some top Signals leaderboard positions (onlyatest for example is just submitting ranked momentum predictions and is understandably very volatile):

Shoutout to @robo_boi for having some incredibly volatile Signals models, anything you are willing to share about why? My guess would be testing out single features? Perhaps not, because @arbitrage is also submitting a single feature to his model “leverage” and it has some of the lowest distances on average I’ve seen:

Questions to still answer:

  • Is there a relationship between rank and initial distance from final day score (i.e. volatility)? What about between FNC rank and initial distance?
  • Same question that @bor1 asked: how reliable is the qualitative difference in daily score between two models submitted in the same week? (i.e. is model X that looked better than model Y in week 1, indeed better than model Y at round resolution)"
    **I did find that MMC tends to follow a slightly tighter path, so generally I think percentile ranks can hold a bit more steadily through time, but it would be interesting to chart that out too.
10 Likes

My guess is that my factor is very similar to one of theirs, but sufficiently different that I can still register a non-zero corr. I’d further guess that if I were able to calculate my measure for the entire universe that my score would increase in volatility. Good analysis @jrai these posts are always very interesting!

That makes sense, good point. Submitting a smaller universe almost certainly brings down daily score distance volatility (just because a lot more 0.5s diluting everything in the first place) and submitting a heavily neutralized factor would also probably decrease intraround volatility. Conveniently, the next post is about neutralized corr vs unneutralized corr (UNC) in signals. With UNC eventually in the API, we can go more in-depth and look at everyone’s “neutralization effects.” I think the range we see is going to be pretty interesting.

Cool analysis - thank you!

Would love to dig into this a little more deeply. Can you put any numbers to “somewhat informative”? How’d you pick day 15?

You’re using absolute distance; are the scores N days prior to the final day equally likely to be below as they are to be above that final score? (aka is there a trend up or down as we converge to the final score?)

Thx

Good question, it’s completely made up. Day 15 is generally where I saw the average absolute distance at around 0.01 where std also accelerated its shrinking to +/ .01. At that point, I felt like it deserved to be called “somewhat informative,” but it was a pretty subjective landmark.

When I first did the analysis, the daily distances looked to be about 0, but now with many more rounds included and a check across more models, it actually looks like very early daily score distances have a significant negative skew, especially in the extreme cases (i.e. see the standard deviation bands). In other words, I guess scores may trend up, on average, as we converge to the final score. It may be possible this negative skew is just from a few extreme rounds in this sample, but it also makes sense that predictions would get “better” as they get closer to what they’re trying to predict in the first place.

5 Likes