Overview
Signals V1 data is out now and includes 24 features and updated tickers.
This dataset contains:
- Factors, which are similar to the factors which the target is neutral to. Use these to determine if your model is sufficiently unique, use them to neutralize your predictions, or use them as additional features in your dataset.
- Some starter features, which are relatively simple classic quant features constructed from returns series.
There is a new column called numerai_ticker
which is the stock’s ticker, and country of execution (as a two-letter ISO code). We also include composite FIGI tickers when available.
Legacy Signals data has been a mess of different sources, but V1 data has been built with our modern data pipelines and is now available via the data API.
import pandas as pd
from numerapi import NumerAPI
napi = NumerAPI()
napi.download_dataset("signals/v1.0/live.parquet", "live.parquet")
pd.read_parquet("live.parquet")
This dataset comes with a new example model, a small LGBM trained on the 24 features in this dataset.
Breaking Changes
There are breaking changes from the legacy dataset:
- Removed old ticker columns
- Renamed
friday_date
todate
- Renamed targets
- Default target change
If you have Bloomberg tickers, you can replace the exchange code with the ISO country code to map to numerai_ticker
. See the updated example notebook for an example.
The targets have slight name changes to match the conventions of our other datasets. For example, target_20d_factor_feat_neutral
is now target_factor_feat_neutral_20
.
The default target is now target_factor_feat_neutral_20
. The legacy default targets (target_4d and target_20d) have been renamed to target_eleven_{length}
.
What do you need to do?
To prepare for this update, at minimum you will need to make the following changes:
Live submissions
-
Update your pipeline to use data from the data API.
-
The live.parquet file contains the latest universe with
numerai_ticker
andcomposite_figi
tickers (along with v1 features)
Model training
- Update your pipeline to map
numerai_ticker
to the ticker of your choosing for historical data. - Update your pipeline to read the
date
column instead offriday_date
- If your pipeline uses any of the auxiliary targets, update the target names to the new convention.
- If your pipeline uses the old default target, be aware that the default target has changed.
You can refer to our new example notebook to see what this looks like now.
V1 Features
The new features are:
feature_adv_20d_factor
feature_beta_factor
feature_book_to_price_factor
feature_country
feature_dividend_yield_factor
feature_earnings_yield_factor
feature_exchange_code
feature_growth_factor
feature_impact_cost_factor
feature_market_cap_factor
feature_momentum_12w_factor
feature_momentum_26w_factor
feature_momentum_52w_factor
feature_momentum_52w_less_4w_factor
feature_ppo_60d_130d_country_ranknorm
feature_ppo_60d_90d_country_ranknorm
feature_price_factor
feature_rsi_130d_country_ranknorm
feature_rsi_60d_country_ranknorm
feature_rsi_90d_country_ranknorm
feature_trix_130d_country_ranknorm
feature_trix_60d_country_ranknorm
feature_value_factor
feature_volatility_factor
Features with {n}(d|w)
in the name (for example, feature_adv_20d_factor
) are time-series features that are computed over n
days or n
weeks.
Features with country_ranknorm
in the name are grouped by country, then ranked, then gaussianized.
Features with factor
in the name refer to risk factors that most of the targets are neutral to.
PPO, RSI and TRIX are examples of technical indicators.
PPO is a percentage price oscillator that compares shorter and longer moving averages in a ratio
RSI is the relative strength index usually used as an overbought/oversold indicator
TRIX is a triple exponential moving average indicator usually used as momentum or reversal feature
momentum_52w_less_4w
refers to one year return of a stock excluding the last 4 weeks.
Minor Target Updates
There are now 20 and 60-day versions of all targets.
Ticker Updates
The new data has two tickers: numerai_ticker
and composite_figi
.
You can still submit any of the currently accepted tickers. So we now accept all of the following:
- cusip
- sedol
- bloomberg_ticker
- composite_figi
- numerai_ticker
You’ll see that we only have composite FIGI tickers going back to September 2022. If anyone knows where to find historic FIGI tickers, please reach out. This also means that we cannot accept composite_figi
for diagnostics.
Legacy Data Deprecation and New Submission Formats
All Signals legacy data is now deprecated and will no longer be available as of March 30, 2024. Anything that does not have the signals/v1.0
prefix in the data API will be discontinued.
Anything not in the data API will also be discontinued. For example, any hardcoded S3 URLs like https://numerai-signals-public-data.s3-us-west-2.amazonaws.com/universe/latest.csv will not be available.
Along with this deprecation, please note the updates to submissions and diagnostics formats. The legacy formats will still be accepted, so updating your submission format is not required.
Live Submissions
- The current live submission header format is:
[(cusip|sedol|bloomberg_ticker|ticker), data_type, friday_date, signal]
: - Submissions have been updated to only require two columns:
[(cusip|sedol|bloomberg_ticker|composite_figi|numerai_ticker), signal]
Diagnostics
- The current diagnostics header format is:
[(cusip|sedol|bloomberg_ticker|ticker), data_type, friday_date, signal]
: - Diagnostics have been updated to only require three columns:
[(cusip|sedol|bloomberg_ticker|numerai_ticker), date, signal]
: