How to solve the Numerai Tournament / lecture by Marcos Lopez de Prado

nyuton · August 23, 2021, 9:26am

Hi,

Since I started selling my predictions, people ask, what do I do. @ageonsen posted a great lecture from Marcos Lopez de Prado a few months ago on how to solve the Numerai tournament with fairly detailed steps. I follow those steps!
Seeing the current burning rate, I guess it didn’t get the attention neccessary.

Ageonsen is now #1 and I keep winning medals every week since I started following those instructions.
I might have been simply lucky in the recent eras, but this lecture is certainly valuable. For the beginner and for the expert as well.

You can read it here:

Have fun!

profricecake · August 27, 2021, 3:31pm

Hi @nyuton - thanks for sharing.

The slides are a great starting point and outline a very sensible approach. But of course they leave implementation completely up to the reader. I’m curious what other references you found useful for getting into the specifics of topics like, say, feature engineering. Is the textbook one of them? Are there others of note?

Thanks again!

nyuton · August 27, 2021, 5:15pm

Sure the textbook is great!
Eye opening about the value of random forests. My best performing models are random forests. I stopped experimenting with NNs afterwards.

platemort · August 31, 2021, 12:59pm

Each week the era of the numerai_tournament_data.csv “test” column increases by one. How then can an era be a one month period?

liz · August 31, 2021, 9:03pm

great point, thanks!

nyuton · September 1, 2021, 6:37am

We start one era every week. There are 4 overlapping open eras.

platemort · September 1, 2021, 3:02pm

I agree with that, but his paper repeatedly refers to the eras as monthly, like here “Train set: 120 months (eras)” on page 6. Shouldn’t that be 120 weeks?

nyuton · September 1, 2021, 3:59pm

Oh, true. In the training set the eras are not overlapping. Yet…
From 8. September we get the full dataset with overlapping eras.

javiermoral · January 25, 2023, 6:41pm

Hi nyuton, thanks for sharing. I am wondering if and how you apply stationarity tests. It makes no sense to apply test directly on the variables since each era reflects the same time period for all assets. Aggregating by eras and calculating the mean and then applying tests (e.g. Augmented Dickey-Fuller) seems too simple.

andralienware · January 26, 2023, 3:34am

I think that’s actually exactly what he means, but not necessarily to the variables, but to the variables’ correlation with the target.

javiermoral · January 26, 2023, 12:32pm

I’ve aggregated by eras, computed the spearman corr of each era, created a series for each feature (v.4.1) on its correlation and then applied Augmented Dickey-Fuller) on each series. Result, not a single null hypothesis rejected. All features are stationary following this apporach.

andralienware · January 26, 2023, 2:58pm

It may also be more important if you are competing in the signals competition.

f58c · November 7, 2023, 10:35pm

thanks for the paper!

Topic		Replies	Views
Era Boosted Models Data Science	21	15199	October 10, 2021
How does training data and validation data relate in "time"? Tournament	8	1826	May 6, 2021
Taking advantage of Eras Data Science	6	3356	June 10, 2021
A few simple newb questions Tournament	3	834	December 21, 2021
Era Purging to minimize data leakage between train/val/test Data Science	4	1811	July 27, 2020

How to solve the Numerai Tournament / lecture by Marcos Lopez de Prado

Related topics