Model Diagnostics: Feature Exposure

Might not be a bad idea to have that officially replace the current code in the analysis and tips notebook.

1 Like

Have both. The former is more general and way faster. And you can do quick total neutralization as a point of comparison.


Here’s a command line feature neutralization script that I posted on RocketChat. Since it’s a standalone script, it should work regardless of whether you’re using Python to build your models or something else. It takes the tournament data and predictions files as the inputs and outputs a neutralized csv file.

Example usage:

# Fully neutralize predictions in example_predictions_target_kazutsugi.csv.xz wrt features in numerai_tournament_data.csv.xz
python numerai_tournament_data.csv.xz example_predictions_target_kazutsugi.csv.xz

Another example:

# Neutralize the top 10 highest exposed features by 50%
python -t 10 -p 0.5 numerai_tournament_data.csv.xz example_predictions_target_kazutsugi.csv.xz

Any way possible to keep my Colab from crashing while doing this?

You can restart your runtime and then load saved predictions as float32. This will work under 10GB of colab memory.

1 Like

Good information thanks for sharing

It was pointed out to me that the R neutralization code I posted earlier in this thread (Model Diagnostics: Feature Exposure) doesn’t end up with values between [0,1]. That’s true – I left off that step at the end. So that’s normal, and you will need to do a minmax type rescaling to get the values into the proper ranger for submission. (The actual values you end up with in that range aren’t important as long as they remain in the same order.)

1 Like

Here is my minmax scaler function for those interested:

minmax <- function(x){(x-min(x))/(max(x)-min(x))}
1 Like

I try to fully understand the approach. I understand that we don’t want a bias term, but it is a bit unfamiliar for me to transform the data to a standard normal. Why do we need this?

With this line of code you first transform the input data (in our appliction the features) to a uniform distribution. Then you apply the standard normal quantile and get realizations from a standard normal. As you say, this is not always needed. What you want to have is just zero expectation (thus no bias), right? With the -0,5 you avoid the borders of the [0,1] interval, correct?

In normalize_vector you basically do the same.

With exposures_m %*% (MASS::ginv(exposures_m) you calculate the \beta of the linear model: scores = \beta * features.

Then finally you calculate score_{neutral} = scores - proportion * \beta* features and rescale score_{neutral}.

Is this understanding correct?

1 Like

First of all, let me just say for the R code in particular that I was just translating the python code given by the team, so at first I was exactly replicating in R what they did in python so I could compare results of each version side-by-side to make sure I got it right. (This should have been trivial for a function of a few lines, but I don’t actually code in python so I had to do it one detail at a time. When I did it I didn’t quite understand the function myself because of mathematical deficiencies of my own – I didn’t even understand that the pseudo-inverse calculation was making an OLS model.) Anyway, I was just trying to get an exact translation at first, but then in the end I didn’t exactly replicate it as I noted – my version applies normalization on the “exposures” (features) as well as the scores (predictions) whereas theirs doesn’t, and I left out the min-max scaling at the end to get it back into the [0,1] range. (I do that part later in my own workflow.) So the reason I used qnorm and subtracted 0.5 from the ranks (to avoid 0 & 1 as you noted) is simply because that matches what they did in python and I’m not sure that is an important detail for this. (They use that same type of rank normalization in the scoring function so were probably just borrowing their own code.) If we just ranked and rescaled to [0,1] I bet results would be pretty much the same (but not identical). I probably tried that, can’t remember.

Also with the lack of the bias term – if you add one I don’t think it hurts, but results will be basically the same. (I definitely tested that.) And I don’t see why it is necessary to divide the result by the standard deviation (since that doesn’t change any rankings), but again that’s in the python version so there it is.


Thank you for your transparent comments and the efforts with the code!!


That’s the matricial format of OLS algorithm without an intercept and how do i know that? Well let me say that is about the advantages of not taking a nap during the econometric classes :smiley:

I think i have one for Ridge:

  ridge_neutralize <- function(scores_v,exposures_m,proportion=1.0,ridge=1.0) {
  scores_v <- scores_v - (proportion * (exposures_m %*%
                                        (MASS::ginv((t(exposures_m) %*% exposures_m) +
                                        ridge*length(scores_v)*(diag(ncol(exposures_m)))) %*%
                                        (t(exposures_m) %*% scores_v))))
  return( scores_v/sd(scores_v) )
1 Like

guys im having issues with the ram , how can i run the code ?

@lollocodes depends on you system and language of choice. I use R and I’ve found that you need a decent memory size to perform this analysis. My machine has 16GB RAM. I would imagine this is simiar to using Python.

With using R I’ve also found using works rather well. allows to run in parallel and multiple clusters.

1 Like

Hello! I am new to the competition…

Can someone explain to me the difference between applying feature neutralization to the features on the target, to get a set of features that contain as much original information as possible but decorrelate with the target VS neutralizing predictions by features?


The difference is if you are trying to get the linear element out of your training/prediction result (neutralizing your predictions) or if you are trying the get the linear element out of your training data because you hope your model then is not focussing on that linear element at all (neutralizing the features to the target before training)

Hey @mdo - starting to think about using this but combining it with the idea of caching with joblib (as per the tensorflow example by @jrb). Before I go pen to paper - was wondering if you could advise on which metric i’d need to store alongside the era? I think the tensorflow example stores era and weights.


Can anyone explain to me what the reasoning is for pred[:,None]-model(feats) the line below:

If I understand correctly, this is the error between our original models predictions, and the predictions of the feature neutralization model. Why calculate the feature exposure of the error, rather than the feature exposure of the predictions of the feature neutralization model? Does this just ensure we don’t drift too far from the original predictions?

Thanks in advance.

Neutralization is finding a linear model model(feats) to subtract off from your predictions predictions[:,None]. That line is measuring how much exposure remains after that subtraction. The linear model is initialized at 0 and then learned until the exposures measured by that line fall below threshold. Make sense?

1 Like

Ahhhh that makes sense thank you. I missed that you mention in the post the model learns the amount to subtract from the original predictions to neutralize it, I thought it was learning a transformation that was feature neutral. Now I understand that it makes perfect sense. Thanks for the help