Self supervised learning on pseudo labels


I would like to share one of my new experiemnts. I tried to pre-train a NN on pseudo labels. I took the predictions of my ensemble and traind the model on them. To my surprise it achieves higher validation CORR than models trained on the training data.

What I did:

  • get predictions on the tournament data (test set)
  • cut out the validation part
  • minmax scale the predictions
  • train NN on the “new” dataset
  • fine-tune on training set

Validation score is great and the first live results are also promising.
I guess good quality predictions on the test set are key to this exercise.
The new dataset gives great validation corr even without fine-tuning on the training set.

Have a great day!