Last round we see the Totally_random model getting:
All metrics consistently negative and TC in percentile 97-99.
This is my model in last round:
All metrics consistently in 95-99 percentile, and poor TC.
Are numeraire people comfortable seeing a random submission getting 97-99 percentile?
I will put my two cents…
TC is computed using a gradient in a layer of the optimization framework.
I don’t know more details but I think the problem is TC is overfitted to the optimization process.
Let me use a more familiar example.
All of you know the built-in feature importance of lightgbm or xgboost are biased towards some variables that are overfitted in the training and / or have a big cardinality.
This is the reason why we use permutation based feature importance, that uses a fresh dataset instead of training dataset.
If we add a random feature to a dataset and fit a model, built-in feature importance of it will be positive and probably a big value if the dataset have low level of signal and big level of noise (like numerai datasets).
Returning to TC I think using the training values of TC they are overfitting it, so if you use random predictions you have a high probability to get high TC.
As for feature importance we need to use a fresh dataset, for TC computing should be good to do something similar and don’t use the same data for adjust the metamodel and compute the TC.