Signal Miner: Find Unique Alpha & Beat the Benchmark

jefferythewind · January 29, 2025, 8:18pm

Signal Miner: Find Unique Alpha & Beat the Benchmark

Revolutionizing Staking: Aligning users and the fund through unique models.

What is Signal Miner?

Signal Miner is a fully automated model mining framework designed to generate models that outperform Numerai’s benchmark models in terms of correlation and Sharpe ratio. Instead of staking on pre-existing models, this tool helps you discover your own unique alpha, which has a better chance of producing positive MMC (Meta Model Contribution).

Why use Signal Miner?

Unique Alpha: Avoids the trap of staking on common, overused models.
Better Payouts: Unique signals increase your expected returns compared to generic staking.
Automated Discovery: Efficiently scans a search space for high-performance models using a scalable, asynchronous approach.

Quick Start: Install & Run

Clone the repo and set up your environment. Instructions available at Github project.

How It Works

The core workflow:

Define a Benchmark Model: This is what your models will aim to outperform.
Launch Model Mining: Explore a grid of hyperparameters asynchronously.
Monitor Performance: Track model evaluations across cross-validation folds.
Compare to the Benchmark: Identify models that exceed performance thresholds.
Export Winning Models: Save the best models for staking or further tuning.

Defining a Benchmark Model

benchmark_cfg = {
    "colsample_bytree": 0.1,
    "max_bin": 5,
    "max_depth": 5,
    "num_leaves": 15,
    "min_child_samples": 20,
    "n_estimators": 2000,
    "reg_lambda": 0.0,
    "learning_rate": 0.01,
    "target": 'target'  # Using the first target for simplicity
}

Launch Mining

start_mining()

Once mining is started, models will be trained and evaluated in the background.

Check Progress Anytime:

check_progress()

Progress: 122.0/2002 (6.09%)

Visualizing Cross-Validation Splits

To ensure proper evaluation, the framework implements time-series cross-validation with an embargo period:

Here, training and test sets are sequentially split to mimic live trading conditions—a crucial step for avoiding data leakage.

Mining Results: Past vs. Future Performance

Since yesterday, I’ve been running Signal Miner to evaluate 70+ models out of 1000, and we already see many models outperforming the benchmark on both validation and test datasets.

Below is a scatter plot showing how models that performed well in validation (past) also tended to do well in test (future).

Sharpe Ratio: Validation vs. Test

Key Insights:

The red dot represents the benchmark model.
While the top validation model wasn’t the best in test, we found several models that outperformed the benchmark in both.
Positive Correlation: The best validation models tended to be among the best in test as well.
If the scatter plot looked random (a cloud of points), it would suggest the model selection process is noise—but instead, we see a clear upward trend.

Goal: Find a model that beats the benchmark in both correlation & Sharpe ratio. Still mining!

Scaling Behavior

This entire process can be viewed as a function of the number of trees in the search space.
For this experiment, I set n_estimators=2000—but early results suggest that increasing this value improves overall performance.

This hints at a scaling law, an idea that has come up in community discussions before.

Join the Experiment!

This is an open-source project, and everyone is welcome to:
Run their own mining experiments
Contribute improvements (PRs welcome!)
Share results & insights

Ready to try? Head over to Signal Miner on GitHub and start mining unique alpha today!

Let’s Make Staking Great Again!

joakim · January 29, 2025, 8:53pm

Consider me thoroughly impressed (though still a bit skeptical—hopefully I’m wrong as is often the case). I’ll definitely give it a try. Thanks for sharing and for the excellent write-up, readme, and model miner notebook!

jefferythewind · January 30, 2025, 1:59pm

Thank you @joakim !

Alright party people, day 2 of mining and I have currently processed a total of 112 models (not that many!) now I have a model that objectively beats the benchmark on both corr and Sharpe.

Also and interesting thing has emerged on this plot.

The benchmark model has arguably the largest generalization error out of any of my field of random models. This means that, for some reason, this model showed very good performance in the validation and considerably less good in the test set. The generalization error here is worse than for a randomly selected model. Why?

One way to understand it is to say this benchmark model is overfit to the validation set. High validation sharpe corresponds to lower test sharpe compared to any of the randomized models so far. You would have to be very unlucky to have picked that model.

taori · January 30, 2025, 4:15pm

Very nice work, thanks for sharing

foolish_observer · January 30, 2025, 5:16pm

I have to say very nice work indeed and props for providing the code so swiftly!
I would still be interested in some comparison of “discovered” model predictions to benchmark model predictions (for uniqueness). This could be a simple correlation of the two or MMC calc. Maybe someone else has an even better idea? Because my hunch is that the new models performance is still highly correlated with the benchmark model. And the new model is just better at exploiting the same patterns.

jefferythewind · January 30, 2025, 6:58pm

Yes certainly it seems a requested feature is more metrics to compare. It is straight forward to put any metric you like in there. Thanks for the support! The code for this actually grew out of a project I did for my doctoral work. It went into a small part of one chapter of my thesis, but I thought the conclusion was profound. I applied the logic to numerai’s data and it helped me to start seeing the problem in a new light.

Unfortunately, what happened in a previous project was that the validation vs. test scatter plot was like a round ball, zero correlation, and indeed OOS live performance was very spotty and random. Of course I didn’t produce this scatter plot until at the end of the project.

What is awesome about Numerai’s data set is that we can usually get a nice positive correlation here, which we see. Of course it depends on the model and what you’re doing with feature selection, etc.

Here is a snapshot of the best model so far…

jefferythewind · January 31, 2025, 3:40pm

Just checking in on day 2 of mining. So far I still haven’t unearthed a better model than my previous, in terms of both corr and Sharpe, but there are now many models which beat the benchmark Sharpe on both validation and test. What does this mean? They scored better on validation (so we would have chosen them over the benchmark, based on validation metrics, and then they also ended up scored better on test set (OOS, in the future).

I’ve exported my best model, so far, and uploaded it to my first mining spot here.

Also, I added a section to the Readme about hardware, with some tips for smashing large data sets: GitHub - jefferythewind/signal_miner: Numerai Signal Miner

joakim · February 1, 2025, 4:29am

I don’t think the notebook will run without errors on Windows or MacOS due to how they handle multiprocessing differently from Linux. At least I wasn’t able to run it on my MacBook without changing it to a .py script with a main function running everything, and the multiprocessing functions at top level. And then when I run it I always run out of memory (64GB) :). My desktop is similar to yours (PopOS 22.04 with AMD Threadripper 2 and 128GB or RAM) and I plan to try it with double the swap file. I’m assuming you use CPU when mining?

jefferythewind · February 1, 2025, 4:00pm

Thank you, @joakim . I am not surprised to hear that. On my system I had tried to package this whole thing into its own module. For some reason just putting all the variables that are currently in the global scope in Jupyter notebook into a class messes up how the multiprocessing and data exchange works between the processes and the memory mapped files.

I will put an emphasis somewhere that right now this only works in its current form, running it from the jupyter notebook (on Linux).

And, yes I use a CPU for this currently. This whole thing should hopefully be extended to use more model types and more architectures so I welcome you to give your best/fastest model a try. In a previous version I had tried to get this working with Murky’s GPU code. It did not work in a straightforward manner. Had to abandoned multiprocessing, I think.

joakim · February 2, 2025, 6:01am

Finally started mining, woohoo!

Have you tried to implement saving progress e.g. in an SQLite DB, with what models were found, and performance on validation and test, etc? If not, I might try to see if I can add that so one could stop mining and restart where left off, as it’s difficult to do anything else while mining.

jefferythewind · February 3, 2025, 12:48pm

Hi @joakim . Great progress!

So if you notice, this line will control how many concurrent processes are run at the same time in the job pool.

pool = Pool(processes=2)

So the reason you’re seeing all the jobs taken up on your computer is because LightGBM is designed to use as many processors as are available. I order to use less resources, you can pass the n_jobs parameter to the LightGBM model to give a maximum number of CPU processors that the model will use. Currently that code is in signal_miner.py. I will work on a way to pipe that parameter through from the notebook.

About starting/stopping mining. Currently all results are saved to the 2 memory-mapped files, so that is already working like data base.

# Prepare memory-mapped files
os.makedirs("results", exist_ok=True)
mmapped_array = np.memmap(
    os.path.join("results", "test_mmapped_array.dat"),
    dtype='float16', mode='w+', shape=(len(data), len(configurations))
)
done_splits = np.memmap(
    os.path.join("results", "test_done_splits.dat"),
    dtype='float16', mode='w+', shape=(len(all_splits) * len(configurations))
)

In a previous version I also saved the configurations locally, so you could restart from a previously unfinished program. Besides the results you also need the list of configurations specific to a particular run. I noticed I forgot to carry this over to the new version. I will put that back in.

The trick there is use a unique name for each mining run, and make the code so it doesn’t over-write past work.

Great recommendations, look for an update coming later today.

jefferythewind · February 3, 2025, 3:01pm

Signal miner update, now I’ve processed over 310 randomized configurations, and now we have 3 models that are beating the benchmark on both corr and Sharpe.

Seriously interesting looking alpha here, with 3 different targets.

This plot is starting to fill out, burying the benchmark deeper in the field.

joakim · February 7, 2025, 8:49am

Extremely slow progress here (I’m searching a wider space) but at least I have a decent benchmark model it looks like.

jefferythewind · February 10, 2025, 2:19pm

Hi @joakim , that’s great but something appears to not be working correctly if you’re not getting more blue dots on your scatter plot. If you are still just using 2 splits, you should have 36/2 = 18 completed models, and so there should be 18 additional blue dots showing up on your scatter plot.

joakim · February 11, 2025, 2:22am

Thanks. You might have to push an update to the notebook, as the one on GitHub shows it had mined 4 models, but they also don’t show up on the scatter plot.

Also, ‘eval_shp’ and ‘train_shp’ I think should be ‘test_shp’ and ‘validation_shp’ respectively.

My res_df only contains the benchmark model unfortunately so I’ll stop mining for now, hoping that you’ll push an update soon.

jefferythewind · February 27, 2025, 4:50pm

Signal Miner Update: Beating the High-Bar Benchmark

Hey everyone,

It’s time for a major update on Signal Miner! We’ve been hard at work refining our approach, running massive models, and pushing our computational limits to uncover optimal signal mining parameters.

What’s New?

Fixed Scatter Plot Bug

The Validation vs. Test Sharpe Scatter Plot is one of the most important visualizations for evaluating parameter performance. However, we discovered that the validation axis was actually displaying the whole Sharpe score instead of the correct validation Sharpe. This has been fixed—so now you can fully trust this critical plot!

Introducing the High-Bar Benchmark

We’ve added a benchmark reference model, which represents the real challenge to beat. This model has seriously strong performance:

Validation Sharpe: 2.44
Test Sharpe: 1.69

Here are its parameters:

benchmark_cfg = {
    "colsample_bytree": 0.1,
    "max_bin": 5,
    "max_depth": 10,
    "num_leaves": 2**10,
    "min_child_samples": 10000,
    "n_estimators": 30_000,
    "reg_lambda": 0.0,
    "learning_rate": 0.001
}

Expanded Search Grid

We’re pushing the search space further, with models now going up to:

30K trees
Max depth: 16
Up to 2048 leaves
Min child samples: 15,000

These are BIG MODELS—the kind of setups that can only be explored with time and patience.

The Race to Beat the Benchmark

I’ve been running configurations non-stop for over a week (or two?)—only 20 configurations fully evaluated so far. BUT, I already found one model that outperforms the benchmark on test Sharpe, and several others in the same ballpark.

With just 20 models tested, we’re already close to beating the benchmark. At this rate, my estimate is that by 100 models, we’ll have a clear winner—a setup that dominates on both validation and test Sharpe.

Stay tuned for more updates as this scatter plot fills out and we beat the benchmark!

Get all the latest code at the Github Project: GitHub - jefferythewind/signal_miner: Numerai Signal Miner

f58c · April 12, 2025, 7:52pm

That’s really cool! Have you staked any of the models? I’m finding that some of my models that perform well don’t do well on live data. But, some models that don’t perform well on the validation set have done well on live data. So far, I haven’t found a consistent way to develop models that beat the benchmark models- although some do: Numerai

jefferythewind · April 12, 2025, 9:41pm

All of my models use this technique to some extent, from signals, to crypto to the main tournament. Recently I have been mining for deep models, up to 30K trees, and progress has been slow. I updated the slot below slot with a pretty competitive model a few weeks ago that took between 1 and 2 months to find, you can see the inflection point in the results from when I switched to the newer model. It was a pretty obvious upgrade from the first model, which was just mined in about a week or so. So the longer you mine, the better your models will be. That’s what this framework is all about: How to decide what will work in live?

jefferythewind · April 16, 2025, 1:48pm

Signal Miner Update.

So its time to retire my current run, which has been going for 2 months now, and using the CPU LightGBM model, I was only able to mine 93 models according to this big grid of parameters.

param_dict = {
    'colsample_bytree': list(np.linspace(0.001, 1, 100)), 
    'reg_lambda': list(np.linspace(0, 100_000, 10000)),
    'learning_rate': list( np.linspace(.00001, 0.3, 1000, dtype='float') ),
    'max_bin' : list(np.linspace(2, 5, 4, dtype='int')),
    'max_depth': list(np.linspace(5, 16, 12, dtype='int')),# [5, 10, 15, 20, 25, 50, 100],
    'num_leaves': list(np.linspace(4, 2048, 2044, dtype='int')),#, 4112],#, 8192, 32768],
    'min_child_samples': list( np.linspace(1,15000,15000,dtype='int') ),
    'n_estimators': list( np.linspace(10,max_trees,max_trees-10,dtype='int') ),#,75,100,150,200],#, 500, 700, 900, 1200], 
    'target':targets,
}

We plot these on the Sharpe ratio plot, comparing past to future performance, and indicate the benchmark model with a red star.

Evaluation

An interesting to notice, which is very common, is that the best performing model from the validation set is not the best performing model from the test set. Even the benchmark outperforms on validation (past), while there were other models that end up performing better in the future.

The overall objective was to find a model that outperformed on both folds of the data. The speed I was running here was just too slow, I don’t have that much time. 93 models? We should be doing this in a day or less. However, even these models are producing MMC. A few weeks ago I put a small ensemble of these models online, and they have have been doing well. In the signal miner code, you can export ensembles easy with this line.

to_export = res_df.sort_values('whole_shp').iloc[-10:].index.tolist() #can be a list to ensemble

Here I export an ensemble of 10 models, based on which performed best in Sharpe on the whole data set.

Next Iteration

I’ve recently learned that a lot of progress has been made in GPU-based GBMs that run super fast compared the CPU-based LGBM, which in the past had been more competitive. CatBoost runs super fast, and I will spin up a new version of the signal miner with CatBoost on GPU. As always, we will get fresh new alpha.

I’ve also developed a fresh new GPU-powered GBM called WarpGBM, in collaboration with @fraulty. It is already comparable in speed to LightGBM, XGBoost and CatBoost on the GPU, but we are aiming for the #1 position in that category. The simplicity of our code is going to enable next-level customization and warp-speed domain generalization.

Stay tuned.

Topic		Replies	Views
Revolutionizing Staking: Aligning Users and the Fund Through Unique Models Tournament	4	471	January 27, 2025
Stake on Signals, vs. Trade Directly Signals	20	2483	September 28, 2021
GFlowNets for Signal Miner: A New Way to Find Diverse, High-Performing Models Data Science	1	1388	March 20, 2025
Creating new targets for Signals Signals	4	1151	June 20, 2021
[Proposal] Improving Signals Competition Signals	5	1249	April 21, 2022

Signal Miner: Find Unique Alpha & Beat the Benchmark

Signal Miner: Find Unique Alpha & Beat the Benchmark

What is Signal Miner?

Quick Start: Install & Run

How It Works

Defining a Benchmark Model

Launch Mining

Visualizing Cross-Validation Splits

Mining Results: Past vs. Future Performance

Scaling Behavior

Join the Experiment!

Signal Miner Update: Beating the High-Bar Benchmark

What’s New?

Fixed Scatter Plot Bug

Introducing the High-Bar Benchmark

Expanded Search Grid

The Race to Beat the Benchmark

Related topics