Diagnostics are built to help us, but they can be very misleading.
My most successfull model, which stands at #39 at the time of writing (nyuton_test8) has so bad diagnostics, that I almost threw it out at the beginning and I haven’t started staking on it until recently.
Trust you CV!
I’m writing this post partly for myself as a reminder, when I see similar results with the new dataset.
To be frank, this model uses an ensemble of models from CV folds. Data also includes validation set. This is the diagnostic of the only part that doesn’t include validation data at all. Probably a good approximation of the other models as well.
An other model (nyuton_test15) looks ever worse than that. And it got 14 medals in its first 9 completed rounds…