How to solve the Numerai Tournament / lecture by Marcos Lopez de Prado

Hi,

Since I started selling my predictions, people ask, what do I do. @ageonsen posted a great lecture from Marcos Lopez de Prado a few months ago on how to solve the Numerai tournament with fairly detailed steps. I follow those steps!
Seeing the current burning rate, I guess it didn’t get the attention neccessary.

Ageonsen is now #1 and I keep winning medals every week since I started following those instructions.
I might have been simply lucky in the recent eras, but this lecture is certainly valuable. For the beginner and for the expert as well.

You can read it here:

Have fun!

33 Likes

Hi @nyuton - thanks for sharing.

The slides are a great starting point and outline a very sensible approach. But of course they leave implementation completely up to the reader. I’m curious what other references you found useful for getting into the specifics of topics like, say, feature engineering. Is the textbook one of them? Are there others of note?

Thanks again!

Sure the textbook is great!
Eye opening about the value of random forests. My best performing models are random forests. I stopped experimenting with NNs afterwards.

7 Likes

Each week the era of the numerai_tournament_data.csv “test” column increases by one. How then can an era be a one month period?

great point, thanks!

We start one era every week. There are 4 overlapping open eras.

1 Like

I agree with that, but his paper repeatedly refers to the eras as monthly, like here “Train set: 120 months (eras)” on page 6. Shouldn’t that be 120 weeks?

Oh, true. In the training set the eras are not overlapping. Yet…
From 8. September we get the full dataset with overlapping eras.

Hi nyuton, thanks for sharing. I am wondering if and how you apply stationarity tests. It makes no sense to apply test directly on the variables since each era reflects the same time period for all assets. Aggregating by eras and calculating the mean and then applying tests (e.g. Augmented Dickey-Fuller) seems too simple.

I think that’s actually exactly what he means, but not necessarily to the variables, but to the variables’ correlation with the target.

1 Like

I’ve aggregated by eras, computed the spearman corr of each era, created a series for each feature (v.4.1) on its correlation and then applied Augmented Dickey-Fuller) on each series. Result, not a single null hypothesis rejected. All features are stationary following this apporach.

1 Like

It may also be more important if you are competing in the signals competition.

thanks for the paper!