Liz Experiment Review Q1 2021 : Generating Features and Applying Feature Neutralization

paulito · April 27, 2021, 12:23pm

I am experiemnting with iterative prediction. The idea is to execute the whole preprocessing era-wise and iterate through the tournament data. This saves tons of memory . I will now try to also switch to era-wise training, which is very easy with xgboost. This will allow even broader feature space.

paulito · April 27, 2021, 12:25pm

It’s all about that sweet spot between overfitting and maximising variance, which is arguably very very small in a tournament like this.

rpica · April 29, 2021, 9:32am

You mean by … neural networks? As far as I know, NN are the ones that can find relations between features (do feature engineering), given that you have enough data and model capacity.

But even so, all the information you can provide as priors is going to boost where the model arrives… as an example, you could get CNN performance on images with vanilla NN (again provided enough data and capacity), but the effort to get there would be inmense, as compared as to giving spatial information and reusing parts of the network (CNNs).

I can imagine CART ensembles finding weird sequences of decisions making sort-of FE, maybe…

paulito · May 11, 2021, 8:59am

Ok so considering the plots, feature neutralization seems to be detrimental overall. If I understand some other posts here reasoning about the usefulness of neutralization correctly, the main idea is to rebalance feature importance, which in turn would avoid overfitting and stabilize models in live environment. But if maximizing payout is your objective, then “Urza”, the model without feature neutralization, shows the best live performance. So why bother about feature neutralization after all?

One hypothesis could be that due to frequent retraining (asuming weekly model retraining), model drift is not a big problem and overfitting more benefitial than rebalancing feature importantance. Another take could be that also models with high FN do suffer from volatiliy in live data, so FN may be not as effective in terms of stabilizing CORR and MMC.

What are your takes on this and how would you improve FN so that CORR and MMC suffer less?

liz · May 11, 2021, 9:13pm

weekly retraining isn’t particularly helpful given the training data doesn’t change weekly. I’ll write some more thoughts about FN soon.

Topic		Replies	Views
Model Diagnostics: Feature Exposure Data Science	43	31245	September 16, 2023
NN architecture for >0.03 CORR on validation set Data Science	52	8213	August 26, 2021
Feature neutralization workflow Data Science	6	5492	February 24, 2021
Optimizing for FNC and TB scores Tournament	31	6474	May 26, 2022
16GB Intermediate solution: XGB Era Boosting Tournament	54	5424	April 1, 2022

Liz Experiment Review Q1 2021 : Generating Features and Applying Feature Neutralization

Related topics