I would like to share one of my new experiemnts. I tried to pre-train a NN on pseudo labels. I took the predictions of my ensemble and traind the model on them. To my surprise it achieves higher validation CORR than models trained on the training data.
What I did:
- get predictions on the tournament data (test set)
- cut out the validation part
- minmax scale the predictions
- train NN on the “new” dataset
- fine-tune on training set
Validation score is great and the first live results are also promising.
I guess good quality predictions on the test set are key to this exercise.
The new dataset gives great validation corr even without fine-tuning on the training set.
Have a great day!