Thanks for the write up Mike! There’s something that has been bothering me about the relationship between the feature neutralization operation and the feature exposure metric that I’ve been trying to figure out how best to explain, but I think I’ve got it now and I would love to hear the thinking about it on your side of things. The problem is that they aren’t exactly related and the feature neutralization operation is not actually a minimizer of the feature exposure metric. The easiest way to see this is if you assume you have a model whose predictions have a correlation of 0.1 with every feature, it’s feature exposure metric would be 0 and impossible to minimize further, while the feature neutralization would then remove those correlations and could only increase the feature exposure metric. The feature exposure metric measures the dissimilarity of correlations across features, while the feature neutralization operation removes the correlations across features. It’s not clear to me which one you actually want and I could see arguments both ways. Very curious to hear your guys’ thoughts!