Signal Miner: Find Unique Alpha & Beat the Benchmark

:rocket: Signal Miner: Find Unique Alpha & Beat the Benchmark

Revolutionizing Staking: Aligning users and the fund through unique models.

:snake: What is Signal Miner?

Signal Miner is a fully automated model mining framework designed to generate models that outperform Numerai’s benchmark models in terms of correlation and Sharpe ratio. Instead of staking on pre-existing models, this tool helps you discover your own unique alpha, which has a better chance of producing positive MMC (Meta Model Contribution).

:bulb: Why use Signal Miner?

  • Unique Alpha: Avoids the trap of staking on common, overused models.
  • Better Payouts: Unique signals increase your expected returns compared to generic staking.
  • Automated Discovery: Efficiently scans a search space for high-performance models using a scalable, asynchronous approach.

:inbox_tray: Quick Start: Install & Run

Clone the repo and set up your environment. Instructions available at Github project.


:fire: How It Works

:bulb: The core workflow:

  1. Define a Benchmark Model: This is what your models will aim to outperform.
  2. Launch Model Mining: Explore a grid of hyperparameters asynchronously.
  3. Monitor Performance: Track model evaluations across cross-validation folds.
  4. Compare to the Benchmark: Identify models that exceed performance thresholds.
  5. Export Winning Models: Save the best models for staking or further tuning.

:trophy: Defining a Benchmark Model

benchmark_cfg = {
    "colsample_bytree": 0.1,
    "max_bin": 5,
    "max_depth": 5,
    "num_leaves": 15,
    "min_child_samples": 20,
    "n_estimators": 2000,
    "reg_lambda": 0.0,
    "learning_rate": 0.01,
    "target": 'target'  # Using the first target for simplicity
}

:rocket: Launch Mining

start_mining()

Once mining is started, models will be trained and evaluated in the background.

Check Progress Anytime:

check_progress()

Progress: 122.0/2002 (6.09%)

:bar_chart: Visualizing Cross-Validation Splits

To ensure proper evaluation, the framework implements time-series cross-validation with an embargo period:

Here, training and test sets are sequentially split to mimic live trading conditions—a crucial step for avoiding data leakage.


:chart_with_upwards_trend: Mining Results: Past vs. Future Performance

Since yesterday, I’ve been running Signal Miner to evaluate 70+ models out of 1000, and we already see many models outperforming the benchmark on both validation and test datasets. :rocket:

Below is a scatter plot showing how models that performed well in validation (past) also tended to do well in test (future).

:bar_chart: Sharpe Ratio: Validation vs. Test

:mag_right: Key Insights:

  • The red dot represents the benchmark model.
  • While the top validation model wasn’t the best in test, we found several models that outperformed the benchmark in both.
  • Positive Correlation: The best validation models tended to be among the best in test as well.
  • If the scatter plot looked random (a cloud of points), it would suggest the model selection process is noise—but instead, we see a clear upward trend.

:loudspeaker: Goal: Find a model that beats the benchmark in both correlation & Sharpe ratio. Still mining! :pick: :snake:


:chart_with_upwards_trend: Scaling Behavior

This entire process can be viewed as a function of the number of trees in the search space.
For this experiment, I set n_estimators=2000—but early results suggest that increasing this value improves overall performance.

This hints at a scaling law, an idea that has come up in community discussions before.


:handshake: Join the Experiment!

This is an open-source project, and everyone is welcome to:
:heavy_check_mark: Run their own mining experiments
:heavy_check_mark: Contribute improvements (PRs welcome!)
:heavy_check_mark: Share results & insights

:rocket: Ready to try? Head over to Signal Miner on GitHub and start mining unique alpha today!

:snake: :pick: Let’s Make Staking Great Again! :rocket:

4 Likes

Consider me thoroughly impressed (though still a bit skeptical—hopefully I’m wrong as is often the case). I’ll definitely give it a try. Thanks for sharing and for the excellent write-up, readme, and model miner notebook!

1 Like

Thank you @joakim !

:snake: Alright party people, day 2 of mining and I have currently processed a total of 112 models (not that many!) now I have a model that objectively beats the benchmark on both corr and Sharpe.


Also and interesting thing has emerged on this plot.

The benchmark model has arguably the largest generalization error out of any of my field of random models. This means that, for some reason, this model showed very good performance in the validation and considerably less good in the test set. The generalization error here is worse than for a randomly selected model. Why?

One way to understand it is to say this benchmark model is overfit to the validation set. High validation sharpe corresponds to lower test sharpe compared to any of the randomized models so far. You would have to be very unlucky to have picked that model. :wink: :snake:

1 Like

Very nice work, thanks for sharing

1 Like

I have to say very nice work indeed and props for providing the code so swiftly! :slight_smile:
I would still be interested in some comparison of “discovered” model predictions to benchmark model predictions (for uniqueness). This could be a simple correlation of the two or MMC calc. Maybe someone else has an even better idea? Because my hunch is that the new models performance is still highly correlated with the benchmark model. And the new model is just better at exploiting the same patterns.

1 Like

Yes certainly it seems a requested feature is more metrics to compare. It is straight forward to put any metric you like in there. Thanks for the support! The code for this actually grew out of a project I did for my doctoral work. It went into a small part of one chapter of my thesis, but I thought the conclusion was profound. I applied the logic to numerai’s data and it helped me to start seeing the problem in a new light.

Unfortunately, what happened in a previous project was that the validation vs. test scatter plot was like a round ball, zero correlation, and indeed OOS live performance was very spotty and random. Of course I didn’t produce this scatter plot until at the end of the project.

What is awesome about Numerai’s data set is that we can usually get a nice positive correlation here, which we see. Of course it depends on the model and what you’re doing with feature selection, etc.

Here is a snapshot of the best model so far…

1 Like

Just checking in on day 2 of mining. So far I still haven’t unearthed a better model than my previous, in terms of both corr and Sharpe, but there are now many models which beat the benchmark Sharpe on both validation and test. What does this mean? They scored better on validation (so we would have chosen them over the benchmark, based on validation metrics, and then they also ended up scored better on test set (OOS, in the future).

I’ve exported my best model, so far, and uploaded it to my first mining spot here.

https://numer.ai/signalminer_1

:snake:

:snake:

Also, I added a section to the Readme about hardware, with some tips for smashing large data sets: GitHub - jefferythewind/signal_miner: Numerai Signal Miner

1 Like

I don’t think the notebook will run without errors on Windows or MacOS due to how they handle multiprocessing differently from Linux. At least I wasn’t able to run it on my MacBook without changing it to a .py script with a main function running everything, and the multiprocessing functions at top level. And then when I run it I always run out of memory (64GB) :). My desktop is similar to yours (PopOS 22.04 with AMD Threadripper 2 and 128GB or RAM) and I plan to try it with double the swap file. I’m assuming you use CPU when mining?

2 Likes

Thank you, @joakim . I am not surprised to hear that. On my system I had tried to package this whole thing into its own module. For some reason just putting all the variables that are currently in the global scope in Jupyter notebook into a class messes up how the multiprocessing and data exchange works between the processes and the memory mapped files.

I will put an emphasis somewhere that right now this only works in its current form, running it from the jupyter notebook (on Linux).

And, yes I use a CPU for this currently. This whole thing should hopefully be extended to use more model types and more architectures so I welcome you to give your best/fastest model a try. In a previous version I had tried to get this working with Murky’s GPU code. It did not work in a straightforward manner. Had to abandoned multiprocessing, I think.

1 Like


Finally started mining, woohoo!

Have you tried to implement saving progress e.g. in an SQLite DB, with what models were found, and performance on validation and test, etc? If not, I might try to see if I can add that so one could stop mining and restart where left off, as it’s difficult to do anything else while mining. :slight_smile:

1 Like

Hi @joakim . Great progress!

So if you notice, this line will control how many concurrent processes are run at the same time in the job pool.

pool = Pool(processes=2)

So the reason you’re seeing all the jobs taken up on your computer is because LightGBM is designed to use as many processors as are available. I order to use less resources, you can pass the n_jobs parameter to the LightGBM model to give a maximum number of CPU processors that the model will use. Currently that code is in signal_miner.py. I will work on a way to pipe that parameter through from the notebook.

About starting/stopping mining. Currently all results are saved to the 2 memory-mapped files, so that is already working like data base.

# Prepare memory-mapped files
os.makedirs("results", exist_ok=True)
mmapped_array = np.memmap(
    os.path.join("results", "test_mmapped_array.dat"),
    dtype='float16', mode='w+', shape=(len(data), len(configurations))
)
done_splits = np.memmap(
    os.path.join("results", "test_done_splits.dat"),
    dtype='float16', mode='w+', shape=(len(all_splits) * len(configurations))
)

In a previous version I also saved the configurations locally, so you could restart from a previously unfinished program. Besides the results you also need the list of configurations specific to a particular run. I noticed I forgot to carry this over to the new version. I will put that back in.

The trick there is use a unique name for each mining run, and make the code so it doesn’t over-write past work.

Great recommendations, look for an update coming later today.

1 Like

Signal miner update, now I’ve processed over 310 randomized configurations, and now we have 3 models that are beating the benchmark on both corr and Sharpe.

Seriously interesting looking alpha here, with 3 different targets.

This plot is starting to fill out, burying the benchmark deeper in the field.

1 Like

Extremely slow progress here (I’m searching a wider space) but at least I have a decent benchmark model it looks like.

image