Liz Experiment Review Q1 2021 : Generating Features and Applying Feature Neutralization

I am experiemnting with iterative prediction. The idea is to execute the whole preprocessing era-wise and iterate through the tournament data. This saves tons of memory :slight_smile:. I will now try to also switch to era-wise training, which is very easy with xgboost. This will allow even broader feature space.

1 Like

It’s all about that sweet spot between overfitting and maximising variance, which is arguably very very small in a tournament like this.

You mean by … neural networks? As far as I know, NN are the ones that can find relations between features (do feature engineering), given that you have enough data and model capacity.

But even so, all the information you can provide as priors is going to boost where the model arrives… as an example, you could get CNN performance on images with vanilla NN (again provided enough data and capacity), but the effort to get there would be inmense, as compared as to giving spatial information and reusing parts of the network (CNNs).

I can imagine CART ensembles finding weird sequences of decisions making sort-of FE, maybe…

1 Like

Ok so considering the plots, feature neutralization seems to be detrimental overall. If I understand some other posts here reasoning about the usefulness of neutralization correctly, the main idea is to rebalance feature importance, which in turn would avoid overfitting and stabilize models in live environment. But if maximizing payout is your objective, then “Urza”, the model without feature neutralization, shows the best live performance. So why bother about feature neutralization after all?

One hypothesis could be that due to frequent retraining (asuming weekly model retraining), model drift is not a big problem and overfitting more benefitial than rebalancing feature importantance. Another take could be that also models with high FN do suffer from volatiliy in live data, so FN may be not as effective in terms of stabilizing CORR and MMC.

What are your takes on this and how would you improve FN so that CORR and MMC suffer less?

weekly retraining isn’t particularly helpful given the training data doesn’t change weekly. I’ll write some more thoughts about FN soon.

1 Like