I’ve tested full neutralization on many of my models, but it never did the trick. Performance suffered.
Which kind of makes sense. Removing linear dependency from all features should result in performance drop.
The new example model brings neutralization to the next level, by neutralizing only on risky features. With features, where the correlation with the target changes the most. This is a more reasonable approach, with hopefully better result. Still it removes linear dependencies from 50 features.
An improvement would be to neutralize only, if
- a feature is in the “risky” group (changing correlation with the target thorugh eras)
- the feature has exposure (high correlation with the target in the live era)
This method further cuts down on neutralization focusing only on cutting feature exposure, where it is necessary for long term performance.
An implementation of this method would look like this:
all_feature_corrs = training_data.groupby('erano').apply( lambda d: d[features].corrwith(d['target'])) riskiest_features = get_biggest_change_features(all_feature_corrs, 50) feature_corrs = predict_data[predict_data.era=='eraX'].apply(lambda d: [features].corrwith(d['prediction_sum'])).abs() feature_corrs_max = feature_corrs.max() feature_corrs_mean = feature_corrs.max().mean() to_neutralize = feature_corrs_max[riskiest_features].sort_values() to_neutralize = to_neutralize[to_neutralize>feature_corrs_mean].index.to_list() predict_data["prediction"] = neutralize( df=predict_data, columns=["prediction"], neutralizers=to_neutralize, proportion=0.5, normalize=True)["prediction"]
What’s you intuition on this?