What exactly is neutralization?

The final stage of the data pipeline is the data neutralization, however it’s a technique I’ve not come across and googling the term “data neutralization” leads to nothing I can find!?

Is this a term that Numerai has adopted or is it known as something else in general data science?

And in terms of what it’s actually doing, it seems to be softening up the correlations of the predictions against the features, but I don’t have a great understanding of it, can anyone offer a more thorough explenation of data neutralization please :slight_smile: ?


The idea is that you use a linear model (that is defined by the features) and you subtract that from your predictions, therefore removing that signal from your predictions or in other words: neutralizing them against the features.
On youtube there is excellent material from arbitrage and also in the forum there is an in depth post on this.

Thanks! I think I understand it in terms of the subtraction from a naive linear model :slight_smile:

In the analysis_and_tips notebook however (example-scripts/analysis_and_tips.ipynb at master · numerai/example-scripts · GitHub) the neutralization function defined there doesn’t seem to make a linear model of any sort but instead subtracts a proportion of some dot product of the pseudo-inverse of features (see code below), this is quite confusing.

def _neutralize(df, columns, by, proportion=1.0):
    scores = df[columns]
    exposures = df[by].values
    scores = scores - proportion * exposures.dot(numpy.linalg.pinv(exposures).dot(scores))
    return scores / scores.std(ddof=0)

Here is the reason that we want to feature neutralize: Feature Exposure Clipping Tool, and working code to deploy locally | Numerai FN Special Part 3 - YouTube

And the notebook discussed in that video can be found here: twitch/FE_Clipping_Script.ipynb at master · jonrtaylor/twitch · GitHub

1 Like

Someone correct me if I’m wrong but I like to think of it as removing any one feature’s influence on predictions such that the resulting predictions are evenly influenced by all dependent features

I would say that a linear model/the inverse of the matrix is “influenced” by all columns/features. The computation generally involves all columns to get the result for one column. But for the rest I mainly agree. In my understanding you take out the linear effect of the features and want to get something that depends more on the aggregate of all features.

1 Like

I was also confused by the way neutralization was done. I could see that what we are getting at is simply running a linear regression of predictions on features then residualizing that out from predictions. That was the idea in my mind, but as you mentioned, the code itself uses a pseudo-inverse and not the normal inverse of variance of predictors formula.

Well it turns out a pseudo-inverse is the solution to the least squares problem. I have a little explainer below. I assume an L-2 norm, of course if we change the norm then we change how we “neutralize”, which could be a interesting avenue: