Expanding the insights from the TB200 effects on the neutralization step in this post, I think there may be a p / (1-p) type of vulnerability that can be viewed in this notebook:
What do you think are the potential impacts of this vulnerability?
Could range from: it doesn’t persist and nets out over time (like a noise submission for MMC), or you need a new scoring mechanism for Signals.
I didn’t read through all your code closely enough to explain why the 1-p correlation isn’t just a sign flip, but I did notice that you didn’t neutralize the target, so you aren’t following the same process as production scoring.
@_liamhz if such a vulnerability existed in production scoring and the sum of correlations was consistently positive, you could submit both signals and always get positive return. The sum of correlations for a portfolio and its 1-p should be zero. If it isn’t zero, then there’s a scoring bug, and if it doesn’t average zero, you can arbitrage it.
Can you share an example of where the additional neutralization (effectively the 3rd neutralization) to the target would take place? Thanks!
Just make a neutralized_target the same way you made neutralized_preds. I don’t recall an official example because they don’t give the features so we couldn’t run the code with the right data anyway. There have been a few unofficial attempts at reproducing, but I don’t have any links handy.
Out of curiosity, aren’t the targets the residuals from the features anyway? So this additional neutralization in effect is just making sure there is no linear presence in the targets?
The 1-p correlation isn’t a sign flip because of neutralization and tie breaks when computing the ranked correlation. There isn’t a scoring bug, and we expect this to average out to zero
Thanks @_liamhz, so just to be clear, we can stake this with impunity because there is no arbitrage opportunity?
Neutralization should make the 1-p a sign flip of the correlations, and only the tie breaking afterwards should keep the sum of the correlations from being zero.
If you believe the tie breaking is not correlated with neutralized performance (counter means tickers are correlated?), the sum should be a random variable with expectation zero (like Liam said) and very low variance (because 5K predictions).