MMC and other metrics for all targets

I trained models using LightGBM with the parameters shown on the charts for each target. This test used cyrus 20 as the scoring target and scoring.contributive_correlation() to estimate MMC and other metrics for the validation data. The other metrics are correlation, standard deviation, sharpe, and consistency across eras. Consistency is simply the count of the eras where correlation > 0.01 divided by the number of eras.

Here are the targets with the best MMC:


Even though rowan 20 has a slightly higher MMC, teager 20 and claudia 20 have better Sharpe and consistency values. The targets that had the best correlation (and consistency) are…


I assume cyrus 20 has the best correlation since it was used in the correlation metric.
This is just a starting point to see which targets may perform better.

Here are some charts and the data for all the tests:


was this using all the features or were you using “medium” features_set as in the example models?

All features were used.

On a similar note, I ran a comparative test between CORR and MMC metrics on my main model today using the main target. My key takeaways were:

  1. MMC and CORR are positively correlated, so training on CORR still makes sense
  2. MMC Sharpe is about half of CORR Sharpe but still attractive
  3. Average MMC is about 10x smaller than average CORR (aligned with your numbers)

However, I believe the expected return-on-risk must be kept high for innovative models as a compensation for the volatility in NMR. So hopefully, these two things will happen after the transition:

  • Higher payout factor thanks to a reduction in high stake models under-performing benchmarks
  • Large MMC multipliers on offer

All we have to do now is wait and hope for the best!

I just did the same, my corr and mmc results are pretty close:


Which are the validation eras used?
Have you compute for several seeds to see the estimation error of the metrics?

For MMC I used filter_sort_index(predictions, meta_model) and for CORR I used filter_sort_index(predictions, validation) so all eras with no NaNs are used.

I did not use several seeds. My intent was to get a relative comparison between the targets as a sanity check. I also computed the same metrics for all the benchmark models shown here:

1 Like