NumerAi's Meta Model definition


I was wondering how NumerAI is selecting the models to construct their metamodel.

At the beginning they were evaluating DS based on some data of the “prediction dataset” of which the true results were known only to NumerAI.

With the new tournament the leaderboard is heavily overfitted being that a lot of people are training on validation data set. At the same time the whole live dataset is unknown to DS and NumerAI.

So: how is NumerAI at the moment selecting the models to use for their metamodel?



The “test” data in the tournament file. Labels are known to Numerai but not to us.



Ouch, I forgot about test :slight_smile:

It would be nice if at the round’s resolution they tell us also the models selected into the meta model and the meta model’s overall logloss



It was my understanding that they tell you who was in the meta model. Look at resolved tournament and it says “### participants, ### controlling capital”. I figure “controlling capital” means in the meta model.



Nope, as far as I can tell the controlling capital number of people is based on validation data set. In fact if you see a resolved round you will see many people with logloss > -ln(0.5) but considered into the controlling capital