What is good models for numerai signals?

Hi, Numerai Community!
I share my model approach and its insight, because of changing phases of signals.
My best performance model is habakan_x, and achieved LB top of 1year return(habakan, top of 1year return model has almost submitted prediction of habakan_x).
But monitoring mmc and IC, I don’t think this model is good for TC.
So I think sharing is no-problem for me and good for the community.
This content was prepared after a discussion with katsu-san. (Thanks @katsu1110 for your advice and discussion!)


  • habakan_x add only the following pipelines to kaggle starter notebook by katsu(model is katsu1110_edelgard)
    • statisticaly denoising of data
    • binning
  • comparing live performance between habakan_x and katsu1110_edelgard
    • Both of IC was correlated, but Corr is lower
      • Neutralization is influenced by denoising and binning
    • Analysis of Corr and IC for each model
      • habakan_x is a lower correlation between Corr and IC than katsu1110_edelgard
    • habakan_x is pretty low TC, and katsu1110_edelgard is high

About My model: habakan_x
It’s jnot need to explan detail about my model. Because my model(habakan_x) is based of signals starter notebook published by katsu(model is katu1110_edelgard).
And habakan_x is added only pipeline to katu1110_edelgard.

  • Denoising data using basic statistical outlier detection
  • binning (num of bin is 9: no particular reason)

This approach is not bad for signals task handling noisy data.
As a result, this model is good performance than my model added unique feature.
I want to comparing katsu1110_edelgard, so feature, classifier and hyperparmeter is same.

Comparing habakan_x and katsu1110_edelgard
Analysis of comparing both models (same feature and classifier) are interesting insights.
Data for analysis can collect by live perfomane of models.

The following is each metrics of statistics.

Corr Mean Corr Sharpe MMC Mean IC Mean
habakan_x 0.0247 1.1160 0.0177 -0.0090
katsu1110_edelgard 0.0152 0.5704 0.0105 -0.0178

This metrics can conclude that corr of habakan_x is better than katsu1110_edelgard, but it is more interesting to focus IC.

Analysis of IC
The figure1 is both of IC transition.(Round 289 ~ 307)

Figure1: IC transition habakan_x (top) katsu1110_edelgard (bottom)

In terms of qualitative, transition of IC is very similar.
Also, figure2 is mapping both of metrics(IC, Corr) and correlation analysis.

Figure2: habakan_x and katsu1110_edelgard metrics Mapping Corr(Left), IC(Right)

R2 of IC is over 0.89, but R2 of Corr is 0.69.
IC result is make sense because mean correlation both of validation prediction is 0.89.
Considering the result, neutralization of signals may be influenced by even denoising or binning proccessing.

Relashionship of IC and Corr for each models
Above experiment was analysis of relashionships of models.
Next experiment is analysis Corr and IC (same model).

Figure3: Corr and IC Mapping habakan_x(Left), katsu1110_edelgard(Right)

R2 of IC and Corr in katsu1110_edelgard is 0.323, in habakan_x is 0.065.
It is difficult to conclude using only this result, because these R2 are depended on latent variables those are inclusion rate of alpha and neutralization factor.
But, it is very strange that there is a difference by only denoising or binning.

Analysis of TC
Monitoring TC, there is a difference rank TC.
habakan_x: 2713
katsu_edelgard: 234

Figure4: TC transition habakan_x (top) katsu1110_edelgard (bottom)

Cumulative of TC seems to be reverse trend.
I have not analysis deeply, TC is also change by denoising or binning.
Honestly, I’m not very serious this result, because I thought this model is “not uniqueness but good perfomance”.
I felt good for valid information result rather than😂

I think It is good that TC is based on Numerai performance.
But, It is also important to feedback the result of signals modeling.
Is above of analysis and result make sense to Numerai Competitor and supporter?
I hope this thread is a chance to share ideas of signals for communiy.
Thank you!


Hello habakan san,

Awesome analysis as usual.

If the data removed by denoising (or binning) was actually important data for TC improvement,
I think that is something success of pick-up-TC-essence.

And IC seems to be mainly contained in un-denoised (majority) data.

To improve TC, Is there any possibility to give weight or
focus on the data that is to be denoised when perform learning?

Sorry if my comment is out of line. Since I am terrible ossan novice,
I don’t know about denoising at all.

(If you give me some reference or example of denoising (Japanese OK), That would be great.)


Thanks, comments!! And, your insights on twitter are always helpful for me!

Unfortunately, my model(habakan_x) has lower TC performance(rank is 2713) than katsu1110_edelgard(rank is 234).
So, denoising or binning approaches may be bad for TC.

My denoising approach is extracting data using quantile 1% ~ 99% in each feature.
This is sample script.

upper_thresh = 0.99
lower_thresh = 0.01

feature_df = df.copy()
for feature in features:
    upper_q = df[feature].quantile(upper_thresh)
    lower_q = df[feature].quantile(lower_thresh)    
    feature_df = feature_df.query('@lower_q < ' + feature + ' < @upper_q')

If you want to use more techniques about denoising, it may be good to survey anomaly detection or outlier detection.
This approach is very simple, but based on “Mahalanobis distance” and Hotelling’s T^2.

But, I think more influence to Corr, MMC, or TC is binning, because denoising approach is only change little data as above.
And the point of TC is calculated using select signals from predction likely TB200, not using all signals of prediction.
I think Getting low TC and high Corr means correlation of signals not using in TC is just good.
So, Binning may be removing signals contributed by TC.

1 Like

Thank you very much for your response and example.
Your through explanation is very logical and make sense even for osssan novice.

I could think of two items that are lost in the binning and denoising.

One is outlier, and the other is distribution (nonlinear component).

Both binning and denoise discard outliers in a sense.
Also, if the type of binning is like qcut() and its labeling value are equally spaced,
The distribution information of original data would be lost.

Regarding classic tournament, the data is binned and its distribution is Gaussian.
It seems that some distribution remains, but some information might be discarded.

Perhaps such distribution information would be useful to complement something lost in the classic data.
(though signals and classic targets are different and things are not so simple…)

To verify this issue, it might be good to compare different type of binning.(remains distribution or not)

1 Like

Using binning, the data is changed distribution like your idea, and reduced cardinality.
I tried to change num of binning, but 5 ~ 20 bins seems to be no change for TC.
I have not tried more than above bins.

1 Like

I just noticed that my signals models are also doing 7 bins qcut and its distribution information was completely lost.

According to my program which was written by me before,
I am not using katsu san’s starter, but using similar technical features.

And I also noticed that My model’s TC is poor, so I will make some models with different type of binning (keep some distribution) and see if TC improves.

I will inform you if I get any feedback in the future.

Thank you for your valuable insights!

1 Like