Better neutralization?

nyuton · September 16, 2021, 7:11am

Hi,

I’ve tested full neutralization on many of my models, but it never did the trick. Performance suffered.
Which kind of makes sense. Removing linear dependency from all features should result in performance drop.

The new example model brings neutralization to the next level, by neutralizing only on risky features. With features, where the correlation with the target changes the most. This is a more reasonable approach, with hopefully better result. Still it removes linear dependencies from 50 features.

An improvement would be to neutralize only, if

a feature is in the “risky” group (changing correlation with the target thorugh eras)
the feature has exposure (high correlation with the target in the live era)

This method further cuts down on neutralization focusing only on cutting feature exposure, where it is necessary for long term performance.

An implementation of this method would look like this:

all_feature_corrs = training_data.groupby('erano').apply( lambda d: d[features].corrwith(d['target']))
riskiest_features = get_biggest_change_features(all_feature_corrs, 50)  

feature_corrs = predict_data[predict_data.era=='eraX'].apply(lambda d: 
 [features].corrwith(d['prediction_sum'])).abs()
feature_corrs_max = feature_corrs.max()
feature_corrs_mean = feature_corrs.max().mean()
        
to_neutralize = feature_corrs_max[riskiest_features].sort_values()
to_neutralize = to_neutralize[to_neutralize>feature_corrs_mean].index.to_list()
        
predict_data["prediction"] = neutralize(
            df=predict_data,
            columns=["prediction"],
            neutralizers=to_neutralize,
            proportion=0.5,
            normalize=True)["prediction"]

What’s you intuition on this?

lackofintelligence · September 16, 2021, 11:30pm

I think its a great idea.

How do you determine the 2nd point? You don’t have live correlations when estimating the exposure on live predictions. Also, could you elaborate on your function below? E.g., are you sorting by standard deviation of the correlations?

OTOH, could it be that riskier features are those that historically show little variation in the historical data but then suddenly change orientation in the live data?

mdo · September 17, 2021, 3:58am

There’s a similar idea near the end of: https://github.com/numerai/example-scripts/blob/master/analysis_and_tips.ipynb

nyuton · September 17, 2021, 11:32am

Sorry, I meant high correlation with the prediction in the live era.
We don’t have targets obviously.

I guess it’s enough to neutralize on features, which has high effect (high correlation) on the predictions.
I’m not 100% sure of this idea, but I’ll submit and see.

Feedback is welcome

nyuton · September 17, 2021, 11:33am

“get_biggest_change_features” here is the same function what you see in the advanced example script.

nyuton · September 17, 2021, 11:39am

Feature importance would be even better!

Neutralize to features, which

are risky (changing correlation with the target through eras)
have high feature importance in the model

If an unimportant feature changes correlation with the target, it won’t effect predictions much anyway.

I’m just trying to find ways to minimize the side effect of neutralization.
Neutralization is a great tool, but neutralizing everything kills the model.

jmrichardson · July 23, 2022, 3:55am

Perhaps the intersection of:

Risky: Top quantile of STDev correlation
High Importance: Top quantile of mean correlation

def get_biggest_change_features(corrs, n, q=.75):
    corrs_mean = corrs.mean().abs().sort_values(ascending=False)
    quantile = np.quantile(corrs_mean, q=q)
    corrs_mean_q = corrs_mean[corrs_mean >= quantile]
    corrs_std = corrs.std().sort_values(ascending=False)
    quantile = np.quantile(corrs_std, q=q)
    corrs_std_q = corrs_std[corrs_std >= quantile]
    worst_n = corrs_mean_q[corrs_mean_q.index.isin(corrs_std_q.index)].index.tolist()
    return worst_n

Topic		Replies	Views
Feature Neutralisation & Autocorrelation Presentation Data Science	5	3197	June 15, 2022
Feature neutralization workflow Data Science	6	5487	February 24, 2021
Model Diagnostics: Feature Exposure Data Science	43	31189	September 16, 2023
What exactly is neutralization? Data Science	11	6754	December 8, 2021
An introduction to feature neutralization / exposure Tournament	0	5855	February 15, 2022

Better neutralization?

Related topics