Relationship of daily round correlations to final round correlations

Since I originally asked this question, this has come up a number of times in Rocket Chat without correction by the team, and @wigglemuse satisfied me earlier that I was misinterpreting what was going on.

Specifically in regards to what you said here,

I am confident that this is incorrect. It might be correct if the hedge fund was only looking at stock prices. However, it does not work if any derivatives with expiration dates are involved. For options specifically, the premium baked into the price will vary based on the time remaining before they expire, and will tend to get smaller as the options get closer to the expiration date. I think that by itself will lead to the smaller price movements and lower daily variation leading to the decreasing trend you see.

Thanks for raising this possibility. Varying expiration dates of certain financial instruments could certainly explain a wider variance in the daily market deltas than I included in my simple simulation.

I’m not ready to accept “it does not work if any derivatives with expiration dates are involved” because after all, if a small percentage of the fund’s holding are in expiring derivatives they wouldn’t move the needle very much.

But let’s assume for a moment that fund is exclusively built from these kinds of expiring instruments. And since we observe steady convergence in every round even though rounds X and X+1 overlap for 15 days, I believe this also would imply that each tournament round must have its own independent holdings that expire during said round.

Yours is certainly a viable theory when combined with assumptions like this about the nature of the fund’s holdings. Perhaps those assumptions are why the Numerai team isn’t chiming in to answer this question about the daily scores, but I’ll still continue the plea. Courtesy of the The humans of Numerai thread, I’m calling out some Numerai staff explicitly here, wondering if @slyfox, @master_key, @mdo, or @son_sioux could offer more insight and/or help us rectify some of the contradictory theories that have appeared in this thread.

What remains clear is that the early daily scores are not very predictive of the final score. Just look at @jrai’s original plot. So why are they presented to us at all? What kind of value are we supposed to get from them if they have no bearing on the final score as some have claimed, and (as of yet) no unambiguous meaning? I would hope the staff could offer a “safe” answer to that question that does not expose information about the fund’s internal operation. I mean, if you’re going to put a speedometer on the car, please let us drivers know how to read it. Or maybe just add some error bars based on the tournament day so that we know how much faith to put into those scores.

Thanks, all, for your continued input!

1 Like

Because the previous setup we had zero feedback for 4 weeks. This is better than that.

1 Like

Given that we are aiming to predict the state of play for 4 weeks time and not for every day leading up to that point, is there any reason why we would expect early scores to be indicative of the outcome? The day that’s most likely to look like the final one in the markets relative to the starting point is the one before, and the further away from the goal you go, the more discrepancy there would inevitably be. Or am I missing something with this reasoning (which I accept is entirely possible!)

Could maybe be just because they are predictions, and we are looking at models that did fairly well (although NO models ever really get high correlations). Be interesting to check on random models.

But please let’s quit with the idea that the team is hiding something or there is big conspiracy. I ASKED THEM THESE EXACT QUESTIONS and Richard directly answered them (and you can go watch it). There has been no evasion on this topic whatsoever. It has always been stated that we are predicting the market 4 weeks hence but since this question gets talked about sometimes I just wanted to clarify with the team that nothing that happens in the intermediate days could affect the score. And he did confirm that. So the question is pretty much settled – we are predicting 4 weeks hence exactly as they always said. The only confusion brought into this is really by users that wondered if that was strictly true. Turns out it is. They’ve been 100% open about it – nobody ever directly asked them before and when I did, Richard answered me.

I mean, I guess he could be straight up lying, or he doesn’t actually know how it works, or there is a bug and our scores have been wrong since forever – all technically in the realm of possibility I suppose. But come on…

2 Likes

Since we are predicting what the market will look like only on day 20, it could stand to reason that, as the round progresses, each day becomes more similar to day 20. I’d expect the difference between day 1 to day 2 to have a lot of noise because they are both the least like day 20 and maybe in different ways. The difference between day 18 and day 19 should have less noise because they are both most like day 20. “There are many ways to be different, but only one way to be the same” might apply? All of this only figures “on average and over time” as of course there could be large shocks later on in rounds on some occasions.

If we didn’t have daily scores, how would we become addicted to refreshing the leaderboard and profile pages? It’s just gamification. Cooler heads should pay little attention to it in the long run, except if we discover some information can be gleaned from a model’s intraround volatility: Sharpe and Sortino ratios on live performance of your models (@degerhan’s post still hasn’t gotten enough attention).

4 Likes

Consider also that day 1 is the closest to the data as we get it at the start of the round, and that might give you hint of a way you could actually use the daily scores to improve your models – although we aren’t given the targets at any point, knowing how much movement there actually is in an average round is potentially actionable information.

@profricecake If this is a concern, you should not be participating in the tournament. In contrast to the very long conversation in Rocket Chat yesterday (and many times before) about trusting third party model staking where there are big trust issues between users, this is about trusting the system.

1 Like

That makes a lot of senses. But for the sake of transparency, revealing the score calculation method should not be a red-line vis-à-vis the hedge fund operation. Transparency is heart of the defi, right :grinning:?

We don’t even know what the target is. So that could have a lot to do with it.

I did look at the data for ALL models and then filtered in a few ways but the basic pattern always holds. First day average change is much bigger than others, and then generally decreases. I think a lot of us have noticed that the first day is way off, and will often reverse the second or third day before it gets even a semblance of a trajectory towards where it is going to end up. Probably down again to how the target is created. We have once in a while we’ve seen some radical changes in scores on the very last day or two though…

I’m not reading into this discussion that anyone is questioning the integrity of the team. And I very much appreciate your insight and discovery @wigglemuse however, while you mention that the issue is pretty much settled, I cannot find any information in the official documentation regarding details on the subject. I trust that you did ask and that Richard did respond, but it seems reasonable to want these details in a readily available point of reference as part of the official documentation, rather than to have to scour the forums and chat rooms. Especially as a newcomer.

What @profricecake has proposed seems like an utterly reasonable possibility for a method of calculation. It may not, in fact, be what is happening but I, for one, would love to see clear easily accessible documentation from the official numerai team, which doesn’t seem like a big ask. If presenting that level of detail would pose some level of data leakage or security risk, then stating so would be sufficient. It’s in all of our interests for each of us to have great performance, so I certainly don’t think anything sneaky or untoward is going on. Probably just a squeeze on resources. I love the tournament, and the community. I believe a push toward more clear and available official documentation gives all of us some bedrock to stand on and would allow for our guessing to focus more on how we can improve predictions and less on the mechanics of the tournament.

My point is that they’ve only ever said it is one way, so there isn’t really any more detail needed in order for it to be the plain truth. The need to clarify it only comes from users essentially asking “Is the way you say it is actually the way it is?” They obviously aren’t going to document every esoteric detail that it is NOT. Things get documented above and beyond the basics when they are continuing questions about them, and that applies here. But the questions come before the documentation because otherwise it is not known which things are points of confusion that need additional clarification. In any case, I am working on that exact documentation, and should be doing that right now instead of posting on the forum. So once that project is more or less complete (complete enough to post anyway), if and when any more questions about esoteric details come up that I haven’t answered, I will be then happy to add to it (with a quick response time) pretty much any reasonable question people have and then it will be there to point to. And in fact, that’s exactly why I made sure to clarify this once and for all directly with the team, because anything I put in those docs I will make sure is correct.

So the entire point of my documentation project is specifically so users (new users especially) do not have to scour all of these sources for all these questions they have, and it will be in one place. We started the project about a month too late, because just as we got this flood of Lex Fridman podcast newbies flooding the place with their questions was about the same day I started working on that. I wish it was ready already, but it will be quite soon.

Even Richard thinks it should say it better on the website. Here’s the bit of the video I was referring to from last fireside chat. (51:06)

Guess it won’t jump to the right place when embedded, if you open up on youtube and expand the “show more” you can see all the questions and that one at 51:06

Hi all.

I came to this thread in search of explanations. I’m not making accusations, I’m offering theories and challenging those that are either unsupported by evidence or that don’t seem to jibe with my readings of the data (or both).

Please try to remember that you Numerai vetrans were here once too. Not everyone participating in the tournament has been exposed to the same information about the tournament. Apparently some of you have been in direct contact with members of the staff, others like me have just started their involvement and know only what’s on the website.

First of all: I’d love to watch it! It sounds like it has all the answers. But you still haven’t shared it yet.

Second: Hiding things is a core part of the Numerai concept. They strip all identifying materials from the data set to keep the competition more data science and less finance. I understand and accept this. They’ve documented it clearly on the website. But the daily scores are the opposite of hidden: they’re front and center for all the world to see, but they are only thinly documented (hence this thread).

Third: Conspiracy? Why did you bring that word up? Just because I’m not ready to accept your claims without evidence? Again, please try and remember that I haven’t had the same access to the primary sources that you’ve had and - like you - just want to hear answers from a source I can trust. Just because you feel that you understand things with great certainty doesn’t automatically make you a trustworthy source, since you are not a Numerai employee nor have (yet) offered any evidence beyond hearsay in support of your claims.

If you post that video that you keep referencing, I would greatly appreciate it. Then it will be available for anyone who has the same questions in the future and who want to hear it straight from Richard’s mouth. This is less ideal than an update to the official Numerai docs but still a big step in the right direction.

Looking ahead, I hope you plan to cite (and make available) your sources in your upcoming FAQ. This is historically how knowledge is built, and for good reason.

Thanks for sharing this. I didn’t know that there was a complete lack of feedback in an earlier incarnation of the tournament. Daily feedback would certainly be an improvement on that, for sure, even if the early days are not indicative of the final score.

This is pretty funny. The thing I’ve asked for most in my posts is a trusted source of information (like Richard) to chime in on this topic. Please don’t misinterpret my skepticism of unfounded claims made by Numerai outsiders as a lack of trust in Numerai’s leadership. There is no connection between the two, which is of course the source of my skepticism in the first place.

Aha! I see the video was posted while I was drafting my last message. I’m looking forward to watching it, @wigglemuse. Thank you for sharing it.

My FAQ may have a few links in it to existing materials, but it certainly isn’t going to source and footnote every little thing as that would be almost impossible. But no implementation details I describe will be in there that I’m not sure about or haven’t verified. And it is all public info – no insider access required other than occasionally asking an insider about something (which they are only going to answer if it is ok to be public). But feel free to ask about any additional details, or if something remains confusing, that’s the whole point.

The video answered some questions unambiguously for me so I thought I’d share what I’ve learned here so others might benefit.

  • Although daily scores are provided, Richard confirmed that “ultimately you get scored on the last day [of each round] only”, and “there’s nothing in the middle that could affect your performance.” Although he did go on to acknowledge that companies going bankrupt or some other large-scale market disruption could certainly change performance during a tournament round.

  • The daily scores, as Richard describes them in the video, are “just an estimate” of what your final daily returns will be if Numerai was forced to give you an estimate even many days out from the actual tournament end.

These statements resonate with the observations from this thread that it’s easier to make a prediction when you’re closer to the actual scoring day, and hence why the daily scores seem to converge towards the final score (because, as many have noted, the market state on day 20 is likely to be more similar to the market state on day 19 than on day 1).

Based on this information, I’m confident that the idea I posted about daily scores being daily correlations is off-base. Here’s to learning!

In the wake of this informative thread, I would like to make three simple suggestions to Numerai to disambiguate this daily score stuff for others in the future.

  1. Call them “daily estimates of your final score” instead of “daily scores” (or something similar that would highlight the fact that they’re estimates)
  2. Add error bars to the website graphs based on what I’m sure is plenty of available data on the variance of the estimated scores relative to the final score. Bars would vanish for all completed rounds, of course, but the IP rounds would each have them, and they would grow progressively larger as we approach the latest round.
  3. Revise some of the documentation under Scoring on this page.

In an attempt to be helpful regarding #3, below are some suggested revisions to the existing docs.

Here’s a current (confusing) paragraph:

Each submission will be scored over the ~4 week duration of the round. Submissions will receive its first score starting on the Thursday after the Monday deadline and final score on Wednesday 4 weeks later for a total of 20 scores.

Here is a clearer and more informative version based on what I gleaned from the video:

Each submission will be scored on the final day of the ~4 week round. Submissions will receive estimates of their final score starting on the Thursday after the Monday deadline that will continue until Wednesday 4 weeks later when the final score will be released. These estimates will be provided on what we call “scoring days” (weekdays M-F minus market holidays). The estimates tend to grow more accurate as predictors of the final score as the tournament round draws to a close, but they are merely estimates. Only the score on the final day counts for the competition.

While I’m at it, I’ll offer a revision to the next paragraph too. Original:

Since a round takes ~4 weeks to resolve, if you submit new predictions every week, you will receive multiple (up to 4) overlapping scores on each scoring day from the 4 ongoing rounds.

Proposed revision:

Since a round takes ~4 weeks to resolve, if you submit new predictions every week, you will receive multiple (up to 4) overlapping score estimates on each scoring day from the 4 ongoing rounds.

Thanks to all who offered their input on this one!

10 Likes

Maybe I’m just beating a dead horse by bumping this… But here’s my take, code below.

Daily score is nothing more than the correlation between prediction vs realized target on that day. The realized target is some form of cumulative return until that point (might be market neutralized, scaled by risk etc).

So it can be fairly well modelled by 5000 (or however many stocks are traded) brownian motions. Each day these are ordered and that order is compared to your ordering.

A change in daily score is not triggered directly by returns of stocks, but by the change in order caused by the change in cumulative returns. As @wigglemuse points out the score will per defintion converge to the final score.

Now to the fun part… The reason why the absolute changes in increments (or std for us preferring l2 norms) decrease as the round progresses is because the average distance between the 1d diffusion processes increase. Higher distance = lower chance of change in the rank correlation between prediction and target.

The standard deviation of increments in daily scores goes down in magnitude by approximately 1/sqrt(day of round + 1).

Code to a simple monte carlo simulation of this:

7 Likes

Thanks @jrai this is really useful. :ok_hand:

Despite how little we know about early daily scores, our obsession with them will forever endure. Here’s a quick update on this post including Signals data and some better code in a colab notebook so you can test your models, do comparisons, and more/better analysis: Google Colab

The high level conclusions are still the same:

  • you can expect roughly .02 - .03 correlation difference from the first day of a round’s score and the final resolved day of a round’s score (aka your actual score which is the only one that matters) on average but could range much higher than that.
  • scores are only somewhat informative around the 15th day into a round

The figures show a single model and a different color transparent line for each round with the daily scores distance from the resolved score for that individual round as the round progresses. Then we chart an average line (in red) and a band of +/1 standard deviation across all rounds. First Signals:


And then Classic Tournament:

For these two models across the tournaments and the same rounds 279-292, Signals and Classic daily distances are similar on average, but different rounds exhibited very different daily distances between the two tournaments (which is probably a good thing if you want to diversify risk by competing in both).

Comparing across some top Classic leaderboard positions, and pulling in more rounds so we can get a clearer picture of the mean (20d corr is only available for Signals starting at round 279), we can also see some more volatility (i.e. higher early daily score distances from final score), which may be a common factor at the top of the leaderboard?

We can see the same looking at some top Signals leaderboard positions (onlyatest for example is just submitting ranked momentum predictions and is understandably very volatile):

Shoutout to @robo_boi for having some incredibly volatile Signals models, anything you are willing to share about why? My guess would be testing out single features? Perhaps not, because @arbitrage is also submitting a single feature to his model “leverage” and it has some of the lowest distances on average I’ve seen:

Questions to still answer:

  • Is there a relationship between rank and initial distance from final day score (i.e. volatility)? What about between FNC rank and initial distance?
  • Same question that @bor1 asked: how reliable is the qualitative difference in daily score between two models submitted in the same week? (i.e. is model X that looked better than model Y in week 1, indeed better than model Y at round resolution)"
    **I did find that MMC tends to follow a slightly tighter path, so generally I think percentile ranks can hold a bit more steadily through time, but it would be interesting to chart that out too.
10 Likes