Validation metrics example script vs website diagnostics

tessier_ashpool · January 19, 2023, 12:31pm

I have been playing around with validation metrics recently, been using validation_metrics from the example script https://github.com/numerai/example-scripts/blob/master/utils.py and I’ve noticed that the values which I get are very different from model diagnostic available on the website. For example validation sharpe is 0.83 on website, but calculated via the validation_metrics it’s 0.58.
Does anyone know, how are the metrics on the website calculated? How can I replicate the performance and risks metrics calculated on the tournament website locally?

chanes · January 24, 2023, 12:36am

Hey bigcube. Can you send me a specific diagnostics ID you are having issues with? I just tried to reproduce my validation sharpe using the code from example-scripts and it matches exactly what I see on the website.

tessier_ashpool · January 25, 2023, 6:25am

I have tested the validation metrics for the example script vs what numbers are shown on the website.
So to reproduce, go to https://github.com/numerai/example-scripts and run the example_model.py. The results are as following (I have changed only that we should calculate all metrics).

The model for which the validation data has been submitted is preds_model_target_neutral_riskiest_50 and as you can locally calculated sharpe is 0.976964
but when I uploaded the validation file on the website I’ve got something like this

The sharpe is 0.9278 which is clearly different from the locally calculated one, same with other metrics. This is quite confusing, as I’m not sure what the reason is.

shatteredx · January 25, 2023, 2:21pm

Website diagnostics only calculates metrics on validation eras 857 to 961.

If you’re calculating validation metrics locally, it’s going to be different if you do it on the entire validation dataset. It should nearly match if you only calculate them on eras 857 to 961.

tessier_ashpool · January 25, 2023, 2:48pm

Indeed, that was the missing link - using the range from the plot, thank you!

Topic		Replies	Views
Submission core metrics Tournament	3	1771	October 2, 2020
Interpreting Model Diagnostics Data Science	0	764	March 30, 2021
Diagnostics percentile calculation Tournament	2	737	March 22, 2021
Coloring validation metrics Data Science	4	1203	May 11, 2021
What is a good sharpe ratio and validation correlation? Tournament	4	3343	May 18, 2020

Validation metrics example script vs website diagnostics

Related topics