Mapping of feature names from legacy to new

mic · December 22, 2021, 2:17am

Here is an unofficial mapping:

mizimno/numerai/blob/main/feature_mapping.csv

legacy_feature,new_feature
feature_charisma1,feature_revitalizing_dashing_photomultiplier
feature_charisma2,feature_unco_terefah_thirster
feature_charisma3,feature_vestmental_hoofed_transpose
feature_charisma4,feature_unsparred_scarabaeid_anthologist
feature_charisma5,feature_intended_involute_highbinder
feature_charisma6,feature_recidivism_petitory_methyltestosterone
feature_charisma7,feature_acerb_venusian_piety
feature_charisma8,feature_terrific_epigamic_affectivity
feature_charisma9,feature_headhunting_unsatisfied_phenomena
feature_charisma10,feature_jiggish_tritheist_probity
feature_charisma11,feature_whitened_remanent_blast
feature_charisma12,feature_glyptic_unrubbed_holloway
feature_charisma13,
feature_charisma14,feature_descendent_decanal_hon
feature_charisma15,feature_synoptic_botryose_earthwork
feature_charisma16,feature_desiderative_commiserative_epizoa
feature_charisma17,feature_nucleophilic_uremic_endogen
feature_charisma18,feature_questionable_diplex_caesarist
feature_charisma19,feature_sudsy_polymeric_posteriority

This file has been truncated. show original

Numerai said that the new features aren’t exactly the same as the old features, so these are the closest matches. Features that don’t exist in the new data set are left blank.

The mapping is reverse engineered from the datasets themselves. I have no knowledge of the feature engineering Numerai did to create the features.

ml_is_lyf · December 22, 2021, 7:05pm

This is great, thanks! How did you determine this?

mic · December 23, 2021, 6:51am

The pairs were selected as those with maximum average correlation (averaged across eras) and a tie break if a feature was selected in multiple pairs. From memory, I think only one tie break was needed, for charisma72, that resulted in feature_acerb_venusian_piety being paired with its second choice of charisma7 instead, which had nearly the same correlation on both fronts.

For validation I sampled a few to check it looked right, but you might want to check a pair or two yourself before you go to town with it.

mic · April 11, 2022, 1:47am

Updated with mapping for v3 to v4.

mic · April 11, 2022, 3:12am

sorry, just corrected some duplicated mappings, please re-download if using

Topic		Replies	Views
Feature sets: mappings and purpose of FNCv3 Tournament	3	1600	March 18, 2023
Performing Exploratory Data Analysis on Numerai Tournament Data with R Data Science	3	6452	December 2, 2021
Problem with features.json Tournament	0	291	February 26, 2024
Mapping of eras from old to new Data Science	3	812	September 12, 2021
Removing Dangerous Features Data Science	23	4916	August 30, 2022

Mapping of feature names from legacy to new

Related topics