Coloring validation metrics

jrb · December 6, 2020, 1:38pm

I’ve been computing the validation metrics locally and keeping my local validation code in sync with all the recent changes (per-era feature-neutral mean and feature exposure, for instance). One thing that I haven’t been able to do until last week was color the metrics the way they’re displayed on the website. I asked @master_key (MikeP on Rocket Chat) for the intervals and percentiles they use for coloring the metrics on the website, and he shared the numbers with me.

Here’s some quick and dirty Python code that I wrote based on the numbers that @master_key shared with me. I suspect there are many others who compute validation metrics locally, who might benefit from this. BTW, if you aren’t already compute validation metrics locally, I’d recommend doing it.All the code needed to compute the validation metrics can be found in the example model.

import numpy as np
from scipy import stats

from colorama import Fore, Style

VALIDATION_METRIC_INTERVALS = {
    "mean": (0.013, 0.028),
    "sharpe": (0.53, 1.24),
    "std": (0.0303, 0.0168),
    "max_feature_exposure": (0.4, 0.0661),
    "mmc_mean": (-0.008, 0.008),
    "corr_plus_mmc_sharpe": (0.41, 1.34),
    "max_drawdown": (-0.115, -0.025),
    "feature_neutral_mean": (0.006, 0.022)
}


def color_metric(metric_value, metric_name):
    low, high = VALIDATION_METRIC_INTERVALS[metric_name]
    pct = stats.percentileofscore(np.linspace(low, high, 100),
                                  metric_value)
    if high <= low:
        pct = 100 - pct
    if pct > 95:  # Excellent
        return f"{Style.BRIGHT}{Fore.GREEN}{metric_value:.4f}" \
               f"{Fore.BLACK}{Style.RESET_ALL}"
    elif pct > 75:  # Good
        return f"{Fore.GREEN}{metric_value:.4f}{Fore.BLACK}"
    elif pct > 35:  # Fair
        return f"{metric_value:.4f}"
    else:  # Bad
        return f"{Fore.RED}{metric_value:.4f}{Fore.BLACK}"

I use the colorama module for coloring text (and it works with Jupyter notebooks, as well as the terminal). It’s quite straightforward to use something else in its place, if needed.

mindyoself · May 10, 2021, 5:07pm

Hello @jrb,
Thanks for that, that’s awesome, I was searching for this. Quick question though. These intervals surely change though don’t they or are they the Gold standard for now?

jrb · May 11, 2021, 10:55am

I believe they’re still the same. I compare the colouring between what’s shown on the website and what gets rendered locally, every once in a while. And I haven’t noticed a difference, yet. Also, you’re right. They’ll almost certainly change in the future. Perhaps when the target changes or when we have more features.

themicon · May 11, 2021, 12:01pm

Just a quick addition: If you use a terminal with a strange colour scheme, you might need to change this “Fore.BLACK” to something else. I needed to check my terminal colour value for text and update the script accordingly.

Awesome script BTW @jrb - Really useful

mindyoself · May 11, 2021, 3:35pm

Now numerai may want to withold these threshold values or are we allowed to have them? Because I can see you have to hard code the means and their upper and lower limit of all the metric intervals everytime right in the dictionary up there. Ideally it would be better if those values were locked in a validation meteric class or separate functions and easily callable that way we don’t have to keep hard coding it everytime. Is that feasible?

Topic		Replies	Views
Validation metrics example script vs website diagnostics Data Science	4	847	January 25, 2023
Submission core metrics Tournament	3	1768	October 2, 2020
Model Diagnostics Update Announcements	0	11712	September 3, 2020
More metrics for ya Data Science	23	7016	April 4, 2021
Interpreting Model Diagnostics Data Science	0	762	March 30, 2021

Coloring validation metrics

Related topics