Validation metrics example script vs website diagnostics

I have been playing around with validation metrics recently, been using validation_metrics from the example script https://github.com/numerai/example-scripts/blob/master/utils.py and I’ve noticed that the values which I get are very different from model diagnostic available on the website. For example validation sharpe is 0.83 on website, but calculated via the validation_metrics it’s 0.58.
Does anyone know, how are the metrics on the website calculated? How can I replicate the performance and risks metrics calculated on the tournament website locally?

Hey bigcube. Can you send me a specific diagnostics ID you are having issues with? I just tried to reproduce my validation sharpe using the code from example-scripts and it matches exactly what I see on the website.

I have tested the validation metrics for the example script vs what numbers are shown on the website.
So to reproduce, go to https://github.com/numerai/example-scripts and run the example_model.py. The results are as following (I have changed only that we should calculate all metrics).

The model for which the validation data has been submitted is preds_model_target_neutral_riskiest_50 and as you can locally calculated sharpe is 0.976964
but when I uploaded the validation file on the website I’ve got something like this

The sharpe is 0.9278 which is clearly different from the locally calculated one, same with other metrics. This is quite confusing, as I’m not sure what the reason is.

Website diagnostics only calculates metrics on validation eras 857 to 961.

If you’re calculating validation metrics locally, it’s going to be different if you do it on the entire validation dataset. It should nearly match if you only calculate them on eras 857 to 961.

2 Likes

Indeed, that was the missing link - using the range from the plot, thank you!