@unsentient make sure to test your model out of samples when you use feature neutralization, because it doesn’t necessarily improve your performance. In my experience feature neutralization helps correlation of simple models, such as the numerai example script, but it hurts more advanced models. You might not be looking for improving the correlation metrics though, but still make sure to test if feature neutralization helps your model or not in the metrics you are interested in. Do not blindly trust feature neutralization.
Thanks for this post, it was an interesting read!
At the end, you mentioned that you tried neutralizing a linear model, which you fitted with OLS.
I don’t really understand this. I’ll explain:
What a linear model does, when you apply OLS, is project the target of your training samples linearly onto the vector space spanned by your feature vectors, i.e. it finds the linear combination of the feature vectors that best approximates the target.
What the function neutralize does (as far as I can tell) is fit a linear model to the column ‘target’ and then subtract its prediction from this columng (and then normalize).
However, if the column ‘target’ was already obtained from a linear model fit on the same feature vectors, then ‘target’ is already a linear combination of the feature vectors. Thus, subtracting the predictions of the linear model should theoretically result in a zero vector.
…or did I miss something? I’d appreciate any feedback ![]()
Neutralize is to the features->predictions, not to the original training target we are predicting. (So we can still do neutralization without the target on live predictions.) So we make a linear model using the features to predict the predictions that our trained model is spitting out, and then subtract that, i.e. we are removing the linear relationships between the features and our predictions. Which is what we call “feature exposure” around here – the correlations between the features and our trained predictions. So fully neutralized set of predictions fully removes the portion of our original predictions that can be generated using a straight linear model (and the result if you do it 100% is predictions with zero correlation to any of the features).
So it still doesn’t make sense to neutralize a purely linear model that uses the original features for essentially the same reason – you’d just be zeroing it out. But the training target (which of course is not available for live eras) is not needed for that step.
