Mapping of feature names from legacy to new

Here is an unofficial mapping:

Numerai said that the new features aren’t exactly the same as the old features, so these are the closest matches. Features that don’t exist in the new data set are left blank.

The mapping is reverse engineered from the datasets themselves. I have no knowledge of the feature engineering Numerai did to create the features.


This is great, thanks! How did you determine this?

The pairs were selected as those with maximum average correlation (averaged across eras) and a tie break if a feature was selected in multiple pairs. From memory, I think only one tie break was needed, for charisma72, that resulted in feature_acerb_venusian_piety being paired with its second choice of charisma7 instead, which had nearly the same correlation on both fronts.

For validation I sampled a few to check it looked right, but you might want to check a pair or two yourself before you go to town with it.


Updated with mapping for v3 to v4.


sorry, just corrected some duplicated mappings, please re-download if using

