Better LGBM Params, Signals V2 Data, and Reducing Signals Churn

Isn’t it weekly turnover, not daily? In any case, that’s all fair. My main point is simply that it may make sense to bring the threshold down at a slightly slower pace and observe what the metamodel’s churn does, because you may get to your target turnover (even 5%) with a better metamodel at a higher constituent threshold than you imagine because of the blend of models with high variance in churn. The other point is that it likely makes more sense to penalize churn in payouts directly, but I understand the tradeoff in ease of implementation.

1 Like

I would think it’s daily turnover, since they rebalance and trade daily now. We had the same issue on Quantopian back in the day. Some people had some amazing looking backtests but with 50% or higher turnover even (usually some type of mean reversion on a small universe), which isn’t tradable even in a small portfolio (just because someone got filled at a certain price for a certain size doesn’t mean that you would have traded at that price). Trading is expensive, but it’s not usually because of commission, clearing fees, borrowing costs, etc. It’s because of slippage, and if you trade in size you can easily move the market. So the alpha you thought you had wasn’t actually real. On Quantopian I always aimed for less than 5% turnover and as large a universe as possible, and they licensed 8 of my strategies. I doubt they licensed any of the high turnover ones. Same thing here I’d say - High Corr is fairly useless if turnover is also high.

Can’t wait to get back into Signals again! If my v5.0 Tournament models ever finish training that is… :slight_smile:

2 Likes

Hey all, thank you so much for the comments. I considered all of your arguments, took a second at the data, and decided the following:

  1. set the threshold to 15% and remove the gradual reduction of the threshold. Our goal is to drastically reduce churn of the Signals MM and a low threshold is the only way to ensure this. Removing the gradual reduction of the threshold just simplifies and clarifies this incentive mechanism.
  2. avoid errors in the API - instead set stake to 0 for the round and add website features to explain why users get their stake set to 0 for the round. This will avoid breaking pipelines and allow high-churn users to continue submitting instead of getting errors. It also allows the hedge fund to make intelligent decisions about churn and user filtering.
  3. move the release date to September 20 to give everyone a bit more time. I know 1 month is still a bit short notice, but this is a non-negotiable mandate. We must crush Signals churn by the end of September.

I think these changes to the threshold and the release plan will simplify the feature for you all and solve a few problems stated in these comments. I’ll be formally posting a new forum post with the churn release plan and sending an email to notify all signals users.

Thanks for the input,
ark

5 Likes

I just started looking at the Signals 5 validation file data; I thought I could get something from that as I shift to deep learning procedures for Signals from the haphazard mess I had before…
So I took a look at the correlations in the “target” variable. The full week delay between eras kills that as well; here’s a plot of week to week correlations over roughly the same period you use in the plots above:
Signals050Churn

So that’s going to cause problems. I haven’t looked at the other targets yet.

Fwiw, I did run off correlations of 20 day returns with delays of 1 to 5 days on just the American stocks used in the validation file. That looks like this:
SignalsChurn1Small
with the one day delay in blue at the bottom, moving up to the 5 day delay in green at the top.

The mean and std of the churn (over 2020 to now) with 1 to 5 days delay are:

delay(days) 1 2 3 4 5
mean 0.068265 0.13125 0.1908 0.24776 0.30186
std 0.027267 0.049722 0.067465 0.082164 0.094161

The kernel-smoothed density distributions of the churn look like this:
SignalsChurn1Ksd
with the same color scheme.

Now TBH, I think the trick here in training might be adapt the target response by looking at the variation in a given ticker return over some number of periods? Maybe that’s in the other validation targets, IDK. But it’s interesting to think about…

2 Likes

Just a wee thought here - several years ago now Numerai expanded Val, Train, and Live files from a few hundred features to over a thousand for the Tournament, I suspect by tacking on feature data from 4 or so previous days (or weeks?). I suspect that because when I’d look at the correlations of the target with the features, one would get similar patterns repeated every 210 bins or so. I wrote about that back in 2021:

Anyway, I think I might try something like that with the Signals data, it might help stabilize the predictions. That should keep me out of trouble for awhile :laughing: