I have been playing around with validation metrics recently, been using validation_metrics
from the example script https://github.com/numerai/example-scripts/blob/master/utils.py and I’ve noticed that the values which I get are very different from model diagnostic available on the website. For example validation sharpe is 0.83
on website, but calculated via the validation_metrics
it’s 0.58
.
Does anyone know, how are the metrics on the website calculated? How can I replicate the performance and risks metrics calculated on the tournament website locally?
Hey bigcube. Can you send me a specific diagnostics ID you are having issues with? I just tried to reproduce my validation sharpe using the code from example-scripts and it matches exactly what I see on the website.
I have tested the validation metrics for the example script vs what numbers are shown on the website.
So to reproduce, go to https://github.com/numerai/example-scripts and run the example_model.py
. The results are as following (I have changed only that we should calculate all metrics).
The model for which the validation data has been submitted is preds_model_target_neutral_riskiest_50
and as you can locally calculated sharpe is 0.976964
but when I uploaded the validation file on the website I’ve got something like this
The sharpe is 0.9278
which is clearly different from the locally calculated one, same with other metrics. This is quite confusing, as I’m not sure what the reason is.
Website diagnostics only calculates metrics on validation eras 857 to 961.
If you’re calculating validation metrics locally, it’s going to be different if you do it on the entire validation dataset. It should nearly match if you only calculate them on eras 857 to 961.