my experiments shows, that there is a hugh potential in reducing the numer of features I use for training. Gains can be as high as +0.5% CORR on the validation set. (Or higher if you do better that I do )
Marcos Lopez de Prado describes the “Mean Descreas Accuracy” algorithm in his book “Advances in Financial Machine Learning”. Here is my code snippet that implements that algorithm.
> def MDA(model, features, testSet): > > testSet['pred'] = model.predict(testSet[features]) # predict with a pre-fitted model on an OOS validation set > corr, std = num.numerai_score(testSet) # save base scores > print("Base corr: ", corr) > diff =  > np.random.seed(42) > for col in features: # iterate through each features > > X = testSet.copy() > np.random.shuffle(X[col].values) # shuffle the a selected feature column, while maintaining the distribution of the feature > testSet['pred'] = model.predict(X[features]) # run prediction with the same pre-fitted model, with one shuffled feature > corrX, stdX = num.numerai_score(testSet) # compare scores... > print(col, corrX-corr) > diff.append((col, corrX-corr)) > > return diff
Simple, fast, elegant and it improves your models!