Why do linear models sometimes do so well?

Hello!

If you have seen the last OHwA you might remember a brief mention of one-hot neutralisation as a method to avoid some of the potential problems with linear feature neutralisation. I’ve done some work and got something running and will be publishing something including code shortly.

While doing this I came across something interesting that I thought I would share. As part of one-hot neutralisation what I have done is fit a model from a feature to the predictions generated by the example model on the tournament data. This will in essence take the average of the predictions for all rows where the feature is at each value. For example below you can see the model for feature_dexterity7.

dex7

Now, what I decided to do was to look at how this changes when you look at only eras where the feature is positively correlated with the target, and vice versa. For this purpose I introduce everyone’s favourite feature, feature_intelligence3 (i3). Below you will see the same model applied to i3.

i3all

You can see that it has a very non-linear shape, this indicates there is a non-linear relationship between i3 and the target.

Next you see the same chart but using the example model trained only on eras where i3 has a positive correlation with the target.

i3pos

And next when i3 has a negative correlation with the target.

i3neg

From these charts you can see that neither really resembles the overall chart, when there is a negative correlation the relationship does in fact look linear! As i3 usually is negatively correlated with the targets in the training eras the coefficient for i3 will be negative in a linear model, you can verify this if you like. When i3 is negatively correlated with the target in an era it is related to the target in a linear way, so, when the linear model is right about i3, it is ‘very right’ if that makes sense. The example model hedges its bets by letting the predicted value increase when i3 = 1 to reflect the almost exponential nature of the target in relation to i3 when the correlation is positive. This means that when the example model is correct about the direction of the correlation of i3 to the target, the linear model is also correct and will perform better due to better capturing the relationship. When the models are wrong, then the linear model will burn harder (as reflected by the volatility of linear models).

That’s the piece, hope you enjoyed reading! I’m trying to work on my explanatory skills by writing posts here so any questions or feedback or clarifications I can make are welcomed :slight_smile:

Happy modelling!

16 Likes

Thanks :+1: Please could you start labeling axes on your charts though, even if it’s in a paint program or in the text; doing so avoids confusion and helps if someone wants to copy one and e.g. paste it into chat.

2 Likes

Yes, very good point. Thanks!