Relationship of daily round correlations to final round correlations

Is the relationship between the daily round correlations and the final submission correlations documented anywhere? I’ve noticed that the last daily correlation of a submission seems to always match with the submission’s correlation. This makes me curious what we are watching in the charts for individual rounds.

Related question and part of why I was looking at the daily round correlations - why don’t recent rounds have correlations in the API any more?

The first part of your question is unclear to me, but it sounds like you may have a misunderstanding of what the scores mean. In any case, can you re-phrase or give an example so we are sure what you are talking about?

And as far as the API, again not sure what you are referring to? (I pull the scores from the API every day no problem. We are talking about the Numerai tournament, not signals, right?)

Re: the first question, look at these screenshots from https://numer.ai/integration_test . The first shows the current correlations of the last 8 rounds. Then the next 8 switch to the round view, and the most recent date always matches the correlation from the first screenshot. I’ve noticed this on my models and a few others that I spot checked, so it doesn’t seem like an accident that the most recent date and the submission correlation match.

Round 237:

Round 236:

Round 235:

Round 234:

Round 233:

Round 232:

Round 231:

Round 230:

For the second question, this code in Colab outputs a list of round/correlation pairs, but the correlations are all None from round185.

napi = numerapi.NumerAPI(public_id=public_id, secret_key=secret_key)
sorted([(r["roundNumber"], r["submission"]["liveCorrelation"]) for r in napi.get_user_activities("integration_test")])

returns

[(168, -0.033684549296499416),
 (169, -0.05988718140443864),
 (170, -0.0553602620696002),
 (171, -0.049266679898113),
 (172, -0.03630719909101644),
 (173, 0.025985500504023682),
 (174, 0.019309163665860288),
 (175, 0.017878789613636287),
 (176, 0.024122615017301122),
 (177, -0.010733890025773116),
 (178, -0.04190927753899791),
 (179, 0.0073781890375614724),
 (180, 0.0015480821870788558),
 (181, 0.008458966549655554),
 (182, 0.0553799401902486),
 (183, 0.04973214110776211),
 (184, 0.053386781798264636),
 (185, None),
 (186, None),
 (187, None),
 (188, None),
 (189, None),
 (190, None),
 (191, None),
 (192, None),
 (193, None),
 (194, None),
 (195, None),
 (196, None),
 (197, None),
 (198, None),
 (199, None),
 (200, None),
 (201, None),
 (202, None),
 (203, None),
 (204, None),
 (205, None),
 (206, None),
 (207, None),
 (208, None),
 (209, None),
 (210, None),
 (211, None),
 (212, None),
 (213, None),
 (214, None),
 (215, None),
 (216, None),
 (217, None),
 (218, None),
 (219, None),
 (220, None),
 (221, None),
 (222, None),
 (223, None),
 (224, None),
 (225, None),
 (226, None),
 (227, None),
 (228, None),
 (229, None),
 (230, None),
 (231, None),
 (232, None),
 (233, None),
 (234, None),
 (235, None),
 (236, None),
 (237, None)]

I can’t answer your API question exactly, but that must be a deprecated field you’re pulling. (I get correlations from v2RoundDetails but I don’t use numerAPI – might want to ask that in the rocketchat “api” channel.)

As to the first question, it does seem like you’ve got a fundamental misunderstanding there because of course they match – the most recent daily score and what you are calling the “submission’s correlation” are the same thing. The daily scores you see on the “round” dropdown for each round are snapshots in time of that day. They are not cumulative or anything like that. Each day your predictions are compared to the live results as they stand on that day (which is actually a lag of 2 days from the live stock market, but that’s another complication). So only the last day of the round when it “resolves” (i.e. the 20th score after 4 weeks) actually means anything – that’s the day you are scored on for payment. All the other days are just something to look at in the meantime. So for each round you have 19 scores and “payouts” that tell you WHAT YOU WOULD HAVE GOTTEN FOR THAT ROUND IF THAT ROUND HAD ENDED ON THAT DAY. Of course it didn’t end on any of those days, so again they are just something to look at and follow along with as the round progresses. Only the final 20th day means anything. And so that explains why the most recent score is always listed on the “submission” page – the submission page is simply the summary of your final scores (for all rounds except the most recent 4) and also in-progress scores on the 4 most recent rounds (which are open, not resolved, except on Wednesdays which is the last day of a round then the 4th one back is a final score for that round).

SO… exactly one score per week actually is final (i.e. is the only one that counts for real payment or burn) – this week it was yesterday’s (Nov 11) score for round 233 as that is the round that finished/resolved this week. And then today (Nov 12) we got a score for round 237 for the first time and so rounds 234-237 are currently open and still in-progress.

Starting to make sense?

1 Like

This is exactly why I asked if there was documentation!

This was actually my understanding before posting, but I wanted to get official (or veteran) confirmation of that understanding. Just looking at the round charts, it is very easy to assume that those are the correlations of that round on that day’s data. Because the label is “correlation”, not “correlation update”, “correlation so far”, “cumulative correlation” or “correlation snapshot”. I looked at those charts for months before realizing I was reading them wrong.

The closest thing I found to a definition in the official docs is just “Each submission will receive daily updated scores starting from the first Thursday after the submission deadline to the Wednesday 4 weeks after.” and “But only your final score and final payout will count.” The latter nixes the naive interpretation just looking at the official chart labels, but does not say how the intermediate updates are calculated. Two obvious candidates to me are calculating over resolved days, and assuming zeros for unresolved days. The former seems a little more intuitive to me.

So just clarifying – that’s exactly what it is. The correlation of your predictions to the state of the market on that day (with 2 day lag from the real life market – Wednesday scores are from Monday’s market, etc). It is just that if it isn’t the final day of the round, that day doesn’t mean anything. When figuring final scores, your intermediate daily scores don’t enter in that calculation whatsoever – again they are just there to look at, nothing else. It used to be we got a single score for each round after waiting a month and having zero idea how the round was shaping up. So that made people crazy with anticipation and now we’ve got something to pass the days. But still, the only score that matters each week is that final score on Wednesday for the 4th most recent round.

Ah, that’s where I got confused. The daily labels tricked me into thinking some of the round was resolving early (i.e. not everything was predicting four weeks out). The updates are more “pretend we resolved the same bets early”. I guess some predictions could be less than four weeks long, but not a helpful way to think about it now. Thanks!

Yes, correct. I’m not sure we’ve ever got a definitive answer on whether anything is actually resolved before the final date, but it doesn’t seem like it. “Pretend it is all resolved” each day before the final day is exactly right – it is just pretend until the final day. Usually your scores track and what you are getting in the final week is pretty close to what you are going to end up with, but not always. Just this week there was a big change for many on the final day as it corresponded with huge market shift on Monday (vaccine announcement, I think). So there is a component of luck there too…

I don’t believe daily scores are indicative of your final resolved score until at least the 15th day (3rd week) of each 20 day (4 week) round. In fact, I think we put way too much weight into daily scores. Here’s my unscientific analysis to answer the question:

This chart shows a different line for every round. The y-axis shows how far each day’s score is from the final score your model gets on that round. The x-axis shows which day of the round you’re on. On the final day of the round, each round’s lines converges to 0 because that is your final score! The dashed line is the average distance for each day over all rounds. Although the average distance of daily scores from final scores over time looks to be 0, that’s only because it’s completely random whether or not my daily scores are higher or lower than what my final score will be.

What’s more important is the absolute value of the difference in your distance from your final day score, which looks like this chart below. Clearly, it’s downward sloping. What this tells me is that on the first day of every round, my daily score will be ~0.03 correlation points away from my final score. It’s not until roughly the 15th day that my daily scores are within 0.01 correlation points of my final score. In some cases, even on the 15th day, my daily score can be as much as 0.04 points away from my final score.
image

And here’s the code if you’d like to check your own models’ “consistencies.” I’ve found most models exhibit the same behavior, though. There is likely something interesting to be found in different models’ changes over daily scores. The same analysis can be done for mmc by changing all references to “correlation” to “mmc”:

napi = numerapi.NumerAPI()
df = pd.DataFrame(napi.daily_submissions_performances("jrai")).set_index("date")
df = df[df["roundNumber"] < 233]

df["distance"] = (
    df["correlation"] - df.groupby("roundNumber")["correlation"].transform("last")
).values

df = (
    df.groupby("roundNumber")
    .apply(lambda x: x.reset_index(drop=True))
    .drop("roundNumber", axis=1)
    .reset_index()
)

#plot distances
df.set_index("level_1").pivot(columns="roundNumber", values="distance").plot(
    figsize=(10, 5), title="Daily Scoring Distance from Final Day Score"
)

df.groupby("level_1").mean().distance.plot(style="k--")

plt.xlim(0, 20)
plt.legend(bbox_to_anchor=(1.4, 1), loc="upper right", ncol=3)
plt.ylabel("Distance from Final Day Score")
plt.xlabel("Days into Round")
plt.figure()

#plot absolute distances
df.abs().set_index("level_1").pivot(columns="roundNumber", values="distance").plot(
    figsize=(10, 5), title="Daily Scoring Absolute Distance from Final Day Score"
)

df.abs().groupby("level_1").median().distance.plot(style="k--", legend=None)

plt.xlim(0, 20)
plt.ylabel("Distance from Final Day Score")
plt.legend(bbox_to_anchor=(2, 1), loc="upper right", ncol=3)

plt.xlabel("Days into Round")
plt.figure()

Edit: I just realized that the absolute distance graph is actually showing the median distance, which might be a better measure than mean distance anyway. I was switching between mean/median and forgot to switch it back. Change any reference between “.median()” and “.mean()” to see the differences.

9 Likes

Thanks @jrai, great stuff! A related question to the usefulness of the daily scores you might know the answer to:

how reliable is the qualitative difference in daily score between two models submitted in the same week? (i.e. is model X that looked better than model Y in week 1, indeed better than model Y at round resolution). Do you happen to have an analysis/intuition on that as well?

The above question makes sense under the assumption that Model X and Y are models that are build in compatible ways and with compatible aims (basically hyperparameter tuning), i.e. not a P 1-P model kind of model :-).

1 Like