Which Model is Better?

Not quite following this “the model isn’t learning” stuff – the final result is the same as the simple mean of what?

I’m referring to the fact that using rmse the loss (the validation loss) does not converge or improves significantly as compared to the loss using the simple mean as prediction: rmse(y_true, model_predictions) rmse(y_true, [y_mean]*len(y_true))

For the model to learn, you’d expect the loss to decrease (with few or more swings, towards zero, the ideal point) but it does not. It just floats around.

Here an example of what I mean:

[LightGBM] [Info] Start training from score 0.500017
[1]	valid_0's l2: 0.0499796
Training until validation scores don't improve for 100 rounds
[2]	valid_0's l2: 0.0499781
[3]	valid_0's l2: 0.0499767
[4]	valid_0's l2: 0.0499762
[5]	valid_0's l2: 0.0499784
[6]	valid_0's l2: 0.0499737
[7]	valid_0's l2: 0.0499731
[8]	valid_0's l2: 0.0499751
[9]	valid_0's l2: 0.0499735
[10]	valid_0's l2: 0.0499729
[11]	valid_0's l2: 0.0499721
[12]	valid_0's l2: 0.0499703
[13]	valid_0's l2: 0.0499686
[14]	valid_0's l2: 0.0499695
[15]	valid_0's l2: 0.0499722
[16]	valid_0's l2: 0.0499722
[17]	valid_0's l2: 0.0499715
[18]	valid_0's l2: 0.0499683
[19]	valid_0's l2: 0.0499679
[20]	valid_0's l2: 0.0499659
[21]	valid_0's l2: 0.0499662
[22]	valid_0's l2: 0.0499669
[23]	valid_0's l2: 0.0499665
[24]	valid_0's l2: 0.0499663
[25]	valid_0's l2: 0.0499652
...
[124]	valid_0's l2: 0.0500266
[125]	valid_0's l2: 0.0500285
Early stopping, best iteration is:
[25]	valid_0's l2: 0.0499652

Get the simple mean error, score it and find the difference with the model:

simple_mean = validation_data[target].mean()
simple_preds = [simple_mean]*len(validation_data)
simple_rmse = mean_squared_error(validation_data[target], simple_preds)
print(f"simple_rmse: {simple_rmse}")
model_rmse = mean_squared_error(validation_data[target],sel_preds)
print(f"model_rmse: {model_rmse}")
print('Difference:', "{:.2%}".format((simple_rmse-model_rmse)/simple_rmse))

simple_rmse: 0.04998767748475075
model_rmse: 0.04997043677304045
Difference 0.03%

Ok, so you’re saying the results you get loss-wise are pretty much the same as if the model just outputted nothing but 0.5 for every row, eh? Got it. Although ranking those same results may tell a different story, i.e. couldn’t it be that it is ranking ok (keeping in mind +0.03 corr on val is very good) but with just a very small spread? (If you see no changes in training then probably not, but ranking is the game after all.)

I think that is perhaps because a large percentage of the stocks are at the 0.5 target level which does very well with a mean guess. I didn’t gave it too much consideration yet, but was thinking of removing that group of neutral stocks to see how well it does on the tails instead.

yes, the actual score of the model, after ranking, is way better than the average one, but usually you’d expect to have a decent decrease in the loss for a reasonable number of rounds (the model is learning) and then an increase (it’s overfitting). This is one of the metric plot (validation; as you see the model starts overfitting almost immediately):
Schermata 2022-01-27 alle 22.34.20

It is also true that the target has “just” five values: [0.0, 0.25, 0.5, 0.75, 1.0] in fact I’ve also tried a couple of classification models, but without great results :wink:

1 Like