Goals extension
I tried thinking a bit more about the reasons of increasing the number of last/past rounds for determining the top quantiles, or for increasing the numbers of the next/future rounds. I think the following makes sense:
For the backtest we should take into account that we get paid for our results in individual rounds and not multiple rounds, but that these payouts do have effect on all future rounds. So for the backtest we are interested in adding more future rounds, but not per se in increasing the number of past rounds.
As participants we are interested in how we can most accurately estimate our future performance. We would definitely want to add extra past rounds to see if that more accurately predicts the future. For adding future rounds I think it depends a bit on your goal. Personally I update my staking ensemble on a weekly basis so I am interested to see how predictive a multitude of past rounds are to predict the next round. Other participants might prefer to re-decide their staking model on a less frequent basis. Those participants are more interested in looking at the past results of multiple rounds to predict the future results of multiple rounds.
So keeping this in mind I am going to show graphs of these 3 situations. I have relatively arbitrarily decided to set the number of multiple rounds at 5. This due to computational time required to create these graphs so not wanting to check too many options, not wanting to pick a number too big which would reduce the number of comparisons I can make and wanting too have at least one completely not overlapping round.
Graphs
‘Backtest’ / last vs next 5 TC
In here we can see again that FNC again seems most predictive, corr is doing pretty well now as well, mmc not that well. TC seems to be doing fine, up to the top 20%-5% (I cut off the top 4%-1% for all graphs due to instability of results due too few users in this top). The not so great performance of TC in the top quantile is not that great, especially due to TC seeming to be a metric which pays out a lot more top-heavy than the other metrics.
‘Participant evaluating, frequent rebalancing’/ last 5 vs next TC
Now FNC and MMC seem to be most predictive, followed by TC and then CORR. So for making your weekly rebalancing in the coming TC period, it might make sense to look at a combination of your FNC and MMC past results.
‘Participant evaluating, infrequent rebalancing’/ last 5 vs next 5 TC
… This seems not very in line with previous results. CORR and MMC performing exceptionally bad, and MMC performing off the chart (slightly), performing well at the top as well.
Explanation
Unfortunately, I don’t really have one. So I am hoping somebody else is able to make sense of it. A prime contender for the most likely cause would in my mind be a bug. But I think this might not be so. I have done some extra testing and also created the graph for metrics vs themselves and to me these non-TC graphs seem very plausible:
Details
The quantiles for metrics of multiple rounds are defined by looking at the best average performances by quantile (e.g. good score for MMC quantile would be 95%), not by the metric (e.g. good score for MMC would be 0.04) itself. An argument could also be made defining quantiles by the avg. metric score, but the way my code was set up it was easiest to extend it in this manner.
Code update
Because this potential bug is bothering me a bit I will refactor the code a bit and then edit the previous code (the 2nd and 3rd comment). If I do find a bug I will notify you guys