Is deep learning method effective here?


I am working in machine learning for several years but a new one here. At first I tried some models:
Logistic Regression: just as the example
linear SVC: seems no better than LR
Kernel SVC (poly/rbf): very slow to train and I didn’t wait for it finished
Random Forest: not effective and easy to be overfitted

Then I tried XGBoost and got a better result than before (slightly but obviously). I believe it can be improved by parameter optimization.

I did not tried ensemble yet.

I made a break and change to deep learning methods, a feedforward neural network with 2 hidden layers. But it seems hard to train and did not get a model which is obviously better than random classifier. I want to made the model overfitted first and then use drop out to decrease such overfitting, but I cant even get an overfitted model! The model seems not to converge if I increase the number of hidden layers to 3. I just wonder is it the problem that I didn’t tune my model well, or the deep model not suitable here? Or XGBoost is better?

I will very appreciate if any one shares some about your experience on such problems.


do you mind sharing your deep learning code here so people can troubleshoot? my guess is you would need some more tuning.


I’m having success with deep learning methods, but you have to be careful with the parameter choices. It does seem odd that you aren’t able to overfit as I have the opposite problem I have to be very careful to prevent overfitting. I do have to use relatively shallow networks though. Like ted said, if you share your code we might be able to help.


@ted @volker48 Thanks for your replies. It seems the code cannot be pasted here (maybe too long) so I add it to github here .

And @volker48, would you please share some information about your model, such as the neural network structure, number or size of hidden layers? Thanks in advance.


Few things for you to try out
switch optimizer first to SGD with a smaller lr, say like 0.0001.
add drop out layers.
reduce batch size.

Personally I like to work “one layer up” with Keras, here is a “starter code” with the above suggestions, the model converges in roughly 10 epochs, but the resulting log loss probably will not win you any NMRs :wink:


@nanoix9 in your original post, you mentioned “model overfitted first and then use drop out to decrease such overfitting”, by chance you have any papers that have demonstrated how well this methodology works on other datasets? I would think this is similar to adding noise between layers as you train, or after your model converges, thus you are perturbing the model and hence the model can ‘wiggle more’ to move to a different place in the search space, or hopefully jump out from the “overfitting zone”, so potentially this should work…?


Ted hit most of the suggestions I was going to provide. For the most part I use shallow networks of 2-3 layers of less than 200 units. As Ted said, sticking to small batch sizes and small learning rates helps.


When I’m focusing on consistency I also use batchnorm and dropout. I monitor consistency during training and checkpoint my models at each epoch. I wrote this Keras callback to track consistency during training.


Thanks for taking your time and give the advice. I didn’t have a paper for that idea, I just think it might work but not test it yet. It’s just an rough thought and might not work. I would gave you more feedback if I tried this by experiment.


Great, thanks for sharing your experience and codes!


Agree about noise between layers. It helped me in the same situation