Model Diagnostics: Feature Exposure

First of all, let me just say for the R code in particular that I was just translating the python code given by the team, so at first I was exactly replicating in R what they did in python so I could compare results of each version side-by-side to make sure I got it right. (This should have been trivial for a function of a few lines, but I don’t actually code in python so I had to do it one detail at a time. When I did it I didn’t quite understand the function myself because of mathematical deficiencies of my own – I didn’t even understand that the pseudo-inverse calculation was making an OLS model.) Anyway, I was just trying to get an exact translation at first, but then in the end I didn’t exactly replicate it as I noted – my version applies normalization on the “exposures” (features) as well as the scores (predictions) whereas theirs doesn’t, and I left out the min-max scaling at the end to get it back into the [0,1] range. (I do that part later in my own workflow.) So the reason I used qnorm and subtracted 0.5 from the ranks (to avoid 0 & 1 as you noted) is simply because that matches what they did in python and I’m not sure that is an important detail for this. (They use that same type of rank normalization in the scoring function so were probably just borrowing their own code.) If we just ranked and rescaled to [0,1] I bet results would be pretty much the same (but not identical). I probably tried that, can’t remember.

Also with the lack of the bias term – if you add one I don’t think it hurts, but results will be basically the same. (I definitely tested that.) And I don’t see why it is necessary to divide the result by the standard deviation (since that doesn’t change any rankings), but again that’s in the python version so there it is.

6 Likes

Thank you for your transparent comments and the efforts with the code!!

2 Likes

That’s the matricial format of OLS algorithm without an intercept and how do i know that? Well let me say that is about the advantages of not taking a nap during the econometric classes :smiley:

I think i have one for Ridge:

  ridge_neutralize <- function(scores_v,exposures_m,proportion=1.0,ridge=1.0) {
  scores_v <- scores_v - (proportion * (exposures_m %*%
                                        (MASS::ginv((t(exposures_m) %*% exposures_m) +
                                        ridge*length(scores_v)*(diag(ncol(exposures_m)))) %*%
                                        (t(exposures_m) %*% scores_v))))
  return( scores_v/sd(scores_v) )
}
1 Like

guys im having issues with the ram , how can i run the code ?

@lollocodes depends on you system and language of choice. I use R and I’ve found that you need a decent memory size to perform this analysis. My machine has 16GB RAM. I would imagine this is simiar to using Python.

With using R I’ve also found using h2o.ai works rather well. h2o.ai allows to run in parallel and multiple clusters.

1 Like

Hello! I am new to the competition…

Can someone explain to me the difference between applying feature neutralization to the features on the target, to get a set of features that contain as much original information as possible but decorrelate with the target VS neutralizing predictions by features?

Thanks.

The difference is if you are trying to get the linear element out of your training/prediction result (neutralizing your predictions) or if you are trying the get the linear element out of your training data because you hope your model then is not focussing on that linear element at all (neutralizing the features to the target before training)

1 Like

Hey @mdo - starting to think about using this but combining it with the idea of caching with joblib (as per the tensorflow example by @jrb). Before I go pen to paper - was wondering if you could advise on which metric i’d need to store alongside the era? I think the tensorflow example stores era and weights.

Cheers!

Can anyone explain to me what the reasoning is for pred[:,None]-model(feats) the line below:

If I understand correctly, this is the error between our original models predictions, and the predictions of the feature neutralization model. Why calculate the feature exposure of the error, rather than the feature exposure of the predictions of the feature neutralization model? Does this just ensure we don’t drift too far from the original predictions?

Thanks in advance.

Neutralization is finding a linear model model(feats) to subtract off from your predictions predictions[:,None]. That line is measuring how much exposure remains after that subtraction. The linear model is initialized at 0 and then learned until the exposures measured by that line fall below threshold. Make sense?

2 Likes

Ahhhh that makes sense thank you. I missed that you mention in the post the model learns the amount to subtract from the original predictions to neutralize it, I thought it was learning a transformation that was feature neutral. Now I understand that it makes perfect sense. Thanks for the help

Does this neutralize function still work? I’ve played with it all day and seems to spit out all manner of numbers, positive, negative, greater than 1. I would have expected it to return an array of transformed predictions between 0 and 1.

Here is a self contained code block that I think shows this function doesn’t work.

import numpy as np
import pandas as pd

def neutralize(df, target="predictions", by=None, proportion=1.0):
    if by is None:
        by = [x for x in df.columns if x.startswith('feature')]

    scores = df[target]
    exposures = df[by].values

    # constant column to make sure the series is completely neutral to exposures
    exposures = np.hstack((exposures, np.array([np.mean(scores)] * len(exposures)).reshape(-1, 1)))

    scores -= proportion * (exposures @ (np.linalg.pinv(exposures) @ scores.values))
    return scores / scores.std()


data = {'feature_1': [0.00, 0.25, 0.50, 0.75, 1.00],
        'feature_2': [0.25, 0.75, 0.50, 0.75, 0.75],
        'feature_3': [1.00, 0.75, 0.00, 0.75, 0.75],
        'feature_4': [0.25, 0.50, 0.25, 0.00, 0.50],
        'predictions':  [0.52, 0.66, 0.71, 0.98, 0.33]}
df = pd.DataFrame(data)

neutralize(df)

Output:

0   -1.953662
1   -0.781465
2   -1.172197
3   -2.344394
4    0.195366
Name: predictions, dtype: float64

An updated neutralize function is located in the example scripts at example-scripts/utils.py at master · numerai/example-scripts · GitHub

1 Like

I think that’s correct – it is just subtracting the linear model which is not bounded that would prevent the result from ending up with negative values, etc. You have to rescale it again to get it back to [0,1] if that’s what you want.

2 Likes

It might be more principled to pass the outputs of the neutralization through a sigmoid function, which will rescale it to [0, 1]. In that case you would effectively be subtracting out a logistic regression model rather than an OLS model.

1 Like

Okay so I tried that neutralize function and now I’m more confused. The neutralize function from the example script returns a DataFrame of transformed features. Is that the data I’m now meant to predict on? Does it not matter that some of it’s out side of the 0 to 1 range? Maybe that’s the point? The data is now scaled so the features are neutralized? I really thought a neutralize function would return transformed predictions.

import pandas as pd
import numpy as np
import scipy as sp

def neutralize(df,
               columns,
               neutralizers=None,
               proportion=1.0,
               normalize=True,
               era_col="era"):
    if neutralizers is None:
        neutralizers = []
    unique_eras = df[era_col].unique()
    computed = []
    for u in unique_eras:
        df_era = df[df[era_col] == u]
        scores = df_era[columns].values
        if normalize:
            scores2 = []
            for x in scores.T:
                x = (sp.stats.rankdata(x, method='ordinal') - .5) / len(x)
                x = sp.stats.norm.ppf(x)
                scores2.append(x)
            scores = np.array(scores2).T
        exposures = df_era[neutralizers].values

        scores -= proportion * exposures.dot(
            np.linalg.pinv(exposures.astype(np.float32), rcond=1e-6).dot(scores.astype(np.float32)))

        scores /= scores.std(ddof=0)

        computed.append(scores)

    return pd.DataFrame(np.concatenate(computed),
                        columns=columns,
                        index=df.index)

data = {'era': [1,1,1,1,1],
        'feature_1': [0.00, 0.25, 0.50, 0.75, 1.00],
        'feature_2': [0.25, 0.75, 0.50, 0.75, 0.75],
        'feature_3': [1.00, 0.75, 0.00, 0.75, 0.75],
        'feature_4': [0.25, 0.50, 0.25, 0.00, 0.50],
        'predictions':  [0.52, 0.66, 0.71, 0.98, 0.33]}
df = pd.DataFrame(data)
df

columns = [c for c in df.columns if c.startswith("feature")]
neutralize(df,columns)

Output:

   feature_1  feature_2  feature_3  feature_4
0  -1.463366  -1.463366   1.463366  -0.598798
1  -0.598798   0.000000  -0.598798   0.598798
2   0.000000  -0.598798  -1.463366   0.000000
3   0.598798   0.598798   0.000000  -1.463366
4   1.463366   1.463366   0.598798   1.463366

Is that better? Or what other users do?

I’m not sure if it’s what other users do, but you could try it with one of your models to see if it improves things. Unfortunately the only way to make progress in ML is to just try things and see what works for you.

1 Like

So I got my head around the rescaling. I tried min max scaling vs logistic scaling and there wasn’t really a difference. Although I think I’ve still got a problem with my neutralizer.

You’re getting neutralized features because you’re passing the feature columns to the columns argument. Whatever you pass to the columns argument is what gets neutralized. Pass your prediction column and you will get a neutralized prediction column as your output. You also should be passing names of features that will be used as “neutralizers” to the neutralizers argument.

Your observation that the output is not scaled from 0 to 1 is expected behavior. You will need to scale the output yourself.

Example

tournament_data[f"preds_neutral"] = neutralize(
    df=tournament_data,
    columns=[PREDICTION_NAME],
    neutralizers=feature_cols,
    proportion=1.0,
    normalize=True,
    era_col=ERA_COL
)
tournament_data[PREDICTION_NAME] = tournament_data[f"preds_neutral"].rank(method='first',pct=True)
1 Like