Self supervised learning on pseudo labels

nyuton · May 21, 2021, 6:53am

Hi,

I would like to share one of my new experiemnts. I tried to pre-train a NN on pseudo labels. I took the predictions of my ensemble and traind the model on them. To my surprise it achieves higher validation CORR than models trained on the training data.

What I did:

get predictions on the tournament data (test set)
cut out the validation part
minmax scale the predictions
train NN on the “new” dataset
fine-tune on training set

Validation score is great and the first live results are also promising.
I guess good quality predictions on the test set are key to this exercise.
The new dataset gives great validation corr even without fine-tuning on the training set.

Have a great day!

sunkay · October 5, 2021, 1:58am

Pseudo labels would be very different from the origin labels even if you minmax scale them.

I found this in rocket chat:

I think binning pseudo labels in [0,0.25,0.5,0.75,1.0] would be better and I would start my experiemnts too.

sirbradflies · October 5, 2021, 5:55am

Hi Nyuton, I just saw this post but thanks for sharing this.
Any thought on why that happened? Is the trend continuing by the way?

It seems that you generated synthetic dataset from the test features and that helps the overall training. I still struggle to see how the test predictions of a model trained on the training dataset could squeeze more information useful to improve the model performance.
I guess however this is more of a philosophical question about synthetic data itself…

Topic		Replies	Views
Numerai Self-Supervised Learning & Data Augmentation Projects Data Science	114	10047	March 22, 2023
How to label live data Tournament	4	808	March 1, 2022
Rolling labelled data updation Tournament	2	748	July 19, 2020
New data and the example predictions Tournament	4	1373	January 6, 2022
NN architecture for >0.03 CORR on validation set Data Science	52	8211	August 26, 2021

Self supervised learning on pseudo labels

Related topics