Model Diagnostics: Feature Exposure

mdo · September 21, 2020, 2:53am

Here is a slightly different take on feature neutralization. Instead of finding a linear model of your predictions and subtracting a proportion of it off, we could instead find a linear model that when subtracted off reduces your feature exposure below a certain target. We could set a target and define a loss function such that when minimized all exposures will be less than or equal to the minimum of current exposure and the maximum desired exposure. So if some features have an exposure of 0.05, and you set a max exposure of 0.10, the features with the exposure of 0.05 won’t necessarily decrease as they would in the current neutralization code. This allows you to keep some of the smaller exposures that might be important, while reducing your largest risks. Test it out and let me know what you think! Be warned, it’s not especially fast…

import torch
from torch.nn import Linear
from torch.nn import Sequential
from torch.functional import F

def exposures(x, y):
    x = x - x.mean(dim=0)
    x = x / x.norm(dim=0)
    y = y - y.mean(dim=0)
    y = y / y.norm(dim=0)
    return torch.matmul(x.T, y)

def reduce_exposure(prediction, features, max_exp):
    # linear model of features that will be used to partially neutralize predictions
    lin = Linear(features.shape[1],  1, bias=False)
    lin.weight.data.fill_(0.)
    model = Sequential(lin)
    optimizer = torch.optim.Adamax(model.parameters(), lr=1e-4)
    feats = torch.tensor(np.float32(features)-.5)
    pred = torch.tensor(np.float32(prediction))
    start_exp = exposures(feats, pred[:,None])
    # set target exposure for each feature to be <= current exposure
    # if current exposure is less than max_exp, or <= max_exp if  
    # current exposure is > max_exp
    targ_exp = torch.clamp(start_exp, -max_exp, max_exp)

    for i in range(100000):
        optimizer.zero_grad()
        # calculate feature exposures of current linear neutralization
        exps = exposures(feats, pred[:,None]-model(feats))
        # loss is positive when any exposures exceed their target
        loss = (F.relu(F.relu(exps)-F.relu(targ_exp)) + F.relu(F.relu(-exps)-F.relu(-targ_exp))).sum()
        print(f'       loss: {loss:0.7f}', end='\r')
        if loss < 1e-7:
            neutralizer = [p.detach().numpy() for p in model.parameters()]
            neutralized_pred = pred[:,None]-model(feats)
            break
        loss.backward()
        optimizer.step()
    return neutralized_pred, neutralizer

def reduce_all_exposures(df, column, neutralizers=[],
                                     normalize=True,
                                     gaussianize=True,
                                     era_col="era",
                                     max_exp=0.1):
    unique_eras = df[era_col].unique()
    computed = []
    for u in unique_eras:
        print(u, '\r')
        df_era = df[df[era_col] == u]
        scores = df_era[column].values
        exposure_values = df_era[neutralizers].values
        
        if normalize:
            scores2 = []
            for x in scores.T:
                x = (scipy.stats.rankdata(x, method='ordinal') - .5) / len(x)
                if gaussianize:
                    x = scipy.stats.norm.ppf(x)
                scores2.append(x)
            scores = np.array(scores2)[0]

        scores, neut = reduce_exposure(scores, exposure_values, max_exp)

        scores /= scores.std()

        computed.append(scores.detach().numpy())

    return pd.DataFrame(np.concatenate(computed), columns=column, index=df.index)


TOURNAMENT_NAME = "kazutsugi"
PREDICTION_NAME = f"prediction_{TOURNAMENT_NAME}"

## Get output of your model
# data[PREDICTION_NAME] = model.predict(data[feature_names])

# reduce feature exposure in each era to max_exp
data_rfe_10 = reduce_all_exposures(data,
                                   [PREDICTION_NAME],
                                   neutralizers=feature_names,
                                   era_col="era",
                                   max_exp=0.10)

# replace prediction with reduced feature exposure prediction and rescale to [0,1]
data[PREDICTION_NAME] = data_rfe_10[PREDICTION_NAME]
data[PREDICTION_NAME] -= data[PREDICTION_NAME].min()
data[PREDICTION_NAME] /= data[PREDICTION_NAME].max()

Topic		Replies	Views
An introduction to feature neutralization / exposure Tournament	0	5797	February 15, 2022
What is the difference between feature exposure and regularization? Data Science	1	1019	September 24, 2022
Feature Request – Publish Metamodel Feature Exposures Tournament	7	1214	June 10, 2021
Better neutralization? Data Science	6	2342	July 23, 2022
Creating features from currency-exposed metrics Signals	1	702	April 10, 2023

Model Diagnostics: Feature Exposure

Related topics