Neural Network converging to three answers


So I’ve built a neural network, but its predictions are always worse than just putting 0.5 for all probabilities. Giving it more training cycles or running over more eras just worsens the problem as it converges to either choosing 0, 0.5, or 1 for all probabilities. I read up on log-loss evaluation and so changed the target from 0 or 1 to a 0.4 or 0.6, which helped, but now it is just converging to 0.4, 0.5, and 0.6 values.

Does anyone have an explanation of this that I can read about? Or advice for how to solve it?
Also I can’t seem to get consistency high enough. How is that evaluated?

Thanks for the help


Is the performance on the training data extremely good and it’s just failing on the validation data? If so, then you’re likely overfitting to the training data and many of the obvious samples are converging to their true targets. You’ll need to do something to prevent this overfitting, the easiest of which in a neural network is probably adding dropout during training. In addition you may want to decrease the number of parameters in your network if it’s quite high as this will lead to learning more complicated functions of the data and will generally lead to more overfit.


So less it more in this case, and also I’ll add the drop out. Thanks for the advice.

What about the consistency metric? How do I improve that?


Consistency is the % of eras in validation where you score better than chance. So first off, by improving your overall validation logloss by reducing training overfit in your model you’ll almost certainly improve your consistency. Beyond that you may need to try tweaking the parameters of your network to better predict, or you could try to oversample training data from particularly hard to classify eras.

Some people train on the validation data, though I wouldn’t recommend that as then you no longer have an accurate metric of your performance on that data.


Question: era should be treated as a covariate (e.g. feature) during training? Until now I have ignored it, am I wrong in doing so?


Not generally no. You can use eras as divisions of data over which to run cross validation, but era shouldn’t directly be used as a feature, since we don’t have era information for test or live data points.


@daenris thanks for answering,
I did nothing with the live data points. Do you use them for cross=validation on top of the rows designated as “validation” in the CSV file? At present my CV score is almost the same for the validation set and the leaderboard.
BTW are you using NN’s?


You need to submit predictions on the full tournament dataset, which includes validation, test, and live data. My point was that we don’t have era information for test or live data, so using era as a feature during training isn’t going to work out, because we won’t have those features when trying to predict.

Some of my models use neural nets, yes.


Just to add my $0.02, the best neural nets I have been able to come up with have 1 layer with < 50 nodes. Generally I tend to think enormous, deep neural networks perform best with many variables vs many observations. YMMV.