Signals V2 “Cosmic” Data

The Signals V2 “Cosmic” dataset is officially released. Download it here.

Similar to the V5 Atlas dataset for the Numerai tournament, the Cosmic release focuses on universe expansion - including even more stocks than the Atlas dataset. We also substantially improved our numerai_ticker and the country feature column.

In the Atlas forum post, we showed v5 models perform much better than v4 models solely due to the larger universe. The case is similar for Signals V2 data. Here are the diagnostics for V1 example validation predictions scored against the V1 dataset:

And here are the diagnostics for V2 example predictions scored against the V2 dataset:

You can see that there is an increase in the mean and Sharpe for 3 of the primary metrics. No change to the model or training parameters, just a larger universe. Note: while churn of our example model does increase, we believe this is negligible and we know that churn is relatively easy to control.

Below, you can see that the ticker universe has greatly expanded from around 5000 stocks in recent years to well over 6000. This is at least a 20% increase for all eras in the last decade:

We’ve added several new countries to the dataset including Chile, China, Colombia, Egypt, India, Morocco, Peru, Qatar, Russia, and UAE. We also fixed several consistency issues with our ticker and country columns. Countries now have consistent codes throughout the entire dataset, and notations such as shareclass are either made consistent or removed altogether if unnecessary.

Signals V2 data will be used for Signals submissions starting Dec 3, 2024. Our records indicate this would not significantly affect the community since we do not require full coverage of the Signals universe.

Signals V2 data will be used for Signals scores in rounds starting on or after Jan 1, 2025.

Furthermore, with this release we are officially deprecating Signals V0, V1, and the ticker map files. These files have been de-listed from the website, but we will allow users to continue downloading these files via the API until Jan 1, 2025. On this date, these files will be removed and will no longer be available for download. They will not be updated each round and the historical files will be deleted. We are unable to provide alternatives.

3 Likes

What is the process for switching over the below CSV files to this new V2 dataset? Will the v2 version be a different S3 path or the same?

AWS_BASE_URL = ‘https://numerai-signals-public-data.s3-us-west-2.amazonaws.com
SIGNALS_UNIVERSE = f’{AWS_BASE_URL}/latest_universe.csv’
SIGNALS_TICKER_MAP = f’{AWS_BASE_URL}/signals_ticker_map_w_bbg.csv’
SIGNALS_TARGETS = f’{AWS_BASE_URL}/signals_train_val_bbg.csv’

Dear Numer.ai,

I am writing to suggest the creation of an ETF (Exchange-Traded Fund) based on your hedge fund. This would allow smaller investors, including the Numer.ai community, to invest in your fund and potentially benefit from its success.

I believe this would be a mutually beneficial opportunity, providing greater access to your fund while also expanding your investor base.

Thank you for considering this suggestion. I look forward to hearing your thoughts.

Sincerely,
Yan

You can get the universe from the live.parquet file. We are unable to provide equivalents for the ticker map files.

I am currently using the opensignals to pull data from Yahoo. Is anyone working to update this package to work with v2 data? I am looking at the new live.parquet file and without a yahoo column, I am unsure how to correctly pull data from Yahoo.