Why do linear models sometimes do so well?

rsmillie94 · March 23, 2021, 7:00pm

Hello!

If you have seen the last OHwA you might remember a brief mention of one-hot neutralisation as a method to avoid some of the potential problems with linear feature neutralisation. I’ve done some work and got something running and will be publishing something including code shortly.

While doing this I came across something interesting that I thought I would share. As part of one-hot neutralisation what I have done is fit a model from a feature to the predictions generated by the example model on the tournament data. This will in essence take the average of the predictions for all rows where the feature is at each value. For example below you can see the model for feature_dexterity7.

Now, what I decided to do was to look at how this changes when you look at only eras where the feature is positively correlated with the target, and vice versa. For this purpose I introduce everyone’s favourite feature, feature_intelligence3 (i3). Below you will see the same model applied to i3.

You can see that it has a very non-linear shape, this indicates there is a non-linear relationship between i3 and the target.

Next you see the same chart but using the example model trained only on eras where i3 has a positive correlation with the target.

And next when i3 has a negative correlation with the target.

From these charts you can see that neither really resembles the overall chart, when there is a negative correlation the relationship does in fact look linear! As i3 usually is negatively correlated with the targets in the training eras the coefficient for i3 will be negative in a linear model, you can verify this if you like. When i3 is negatively correlated with the target in an era it is related to the target in a linear way, so, when the linear model is right about i3, it is ‘very right’ if that makes sense. The example model hedges its bets by letting the predicted value increase when i3 = 1 to reflect the almost exponential nature of the target in relation to i3 when the correlation is positive. This means that when the example model is correct about the direction of the correlation of i3 to the target, the linear model is also correct and will perform better due to better capturing the relationship. When the models are wrong, then the linear model will burn harder (as reflected by the volatility of linear models).

That’s the piece, hope you enjoyed reading! I’m trying to work on my explanatory skills by writing posts here so any questions or feedback or clarifications I can make are welcomed

Happy modelling!

minou · March 24, 2021, 10:33am

Thanks Please could you start labeling axes on your charts though, even if it’s in a paint program or in the text; doing so avoids confusion and helps if someone wants to copy one and e.g. paste it into chat.

rsmillie94 · March 24, 2021, 10:48am

Yes, very good point. Thanks!

Topic		Replies	Views
Feature Neutralisation & Autocorrelation Presentation Data Science	5	3197	June 15, 2022
Better neutralization? Data Science	6	2349	July 23, 2022
Model Diagnostics: Feature Exposure Data Science	43	31189	September 16, 2023
How to Safely Perform Feature Neutralization Data Science	3	2705	October 3, 2020
What exactly is neutralization? Data Science	11	6754	December 8, 2021

Why do linear models sometimes do so well?

Related topics