Better LGBM Params, Signals V2 Data, and Reducing Signals Churn

Another point to consider: there is a user-induced churn, but there is also a Numerai-induced churn.

This is a plot of a week-to-week symmetric difference in tickers (len(set(era_tickers) ^ set(last_era_tickers))), v1 data:
image

As you can see, some weeks are over 1000 (over ~20%) churn - just on the Numerai side. Meaning, if the 10% churn limit is introduced, it doesn’t matter how good your churn is in these weeks - Numerai will basically say “WE decided to pivot, but YOU are the one who will get rejected for the whole week”.

It also means that if you target a specific market (e.g. US tech stocks) and submit let’s say <500 tickers, you might churn out too just because Numerai swaps on average 74 tickers a week (i.e. with 500 tickers, you are practically guaranteed to hit the 10% churn mark).

As a suggestion: you can incorporate Numerai’s churn into the churn threshold. Meaning, if Numerai enforces a 4% churn, the end-user churn limit could be 14% (10% avg churn + 4% Numerai churn).

Full stats:

mean       73.964444
std        49.387679
min        16.000000
25%        56.000000
50%        65.000000
75%        79.000000
max      1111.000000
3 Likes