This post is an analysis comparing 0% and 100% feature neutralization (FN) using the analysis and tips python code for three models across 30 weeks of live data from round 209 to 238.
I decide to write this post following the attention that budbot_7 got for being an early adopter of FN in hope that the community might find it useful. I am not a formally trained data scientist, but I am an engineer who works in a field where poor validation practices can have deadly consequences. I love what the numerai team and community represent. There may be opinions expressed in this post that you disagree with, these comments are made with the utmost respect for the team and community and intended for constructive discussion.
We are going to look at the performance of three models and their 100% neutralized counterparts as per the table below:
|ENS||An ensemble model with 0.94 correlation with example predictions.|
|ENS_FN||As above, 100% neutralized to features|
|EP||Example predictions taken from integration_test_|
|EP_FN||As above, 100% neutralized to features|
|NN||A simple deep neural network with 0.75 correlation with example predictions|
|NN_FN||You should be able to guess what this one is by now|
Lets look at a summary of the performance of these models over 30 weeks of resolved rounds from 209 to 238.
Surprisingly and contrary to validation results, the live corr is slightly higher for the FN models indicating that this period was good for FN models. As expected, the MMC for FN models is significantly higher given their low correlation with the example predictions. If not many other users were using FN models over this time then this makes sense. Remember though that with the attention and recent performance of FN models this is likely to significantly increase adoption which will likely reduce the benefits FN has on MMC for individual users. I’m going to discuss this more later. Corr+2MMC looking very attractive for this period, average 5.3% weekly return for EP_FN if you were able to stake on Corr+2MMC over this period. Pretty amazing for copy and pasting the analysis and tips code and applying it to the example predictions file!!!
With leaderboard bonuses now gone from the tournament the main thing I care about as a user is my models’ long term Sharpe. I want my models to have consistent positive performance and particularly be resilient to burns, even if they “underperform” during the good times. This is what makes FN so attractive to me.
You can see that live Sharpe on corr for the FN models is pretty awesome, being significantly above 1 compared to around 0.6 for the non-FN models. Sharpe on MMC is lower, which is one of the reasons I don’t like MMC. In fairness to the MMC staking mechanism though, which is corr+MMC, this improves Sharpe even further. Note that adding too much MMC (x2) starts to reduce Sharpe again.
So now lets look at what we all care about, how much NMRs would I have if I had staked 1 NMR on each of these models over this period.
I know you can’t just stake on MMC, but I wanted to include this to make the point that if corr is ever removed from staking altogether (as in can only stake on pure MMC) then I’m out. As expected there is no staking scenario that is better for non-FN compared to FN models. Of particular note are the corr + 2 x MMC returns. Again, look at that return on example predictions neutralized, 4.59 NMRs for 1 NMR down in only 30 weeks! I’m a poorer person for staking on corr the whole time.
So based on this analysis you would be silly to do anything other than FN and stake on Corr + 2 x MMC. But be careful. This period has been extremely good for FN. And with the likely increased adoption of FN it may not be as beneficial for MMC in the future.
Even after this analysis I am still going to stake on corr only. Why? Well I just don’t like the concept of single round MMC scoring as a user that aims for realistic corr and high Sharpe and values robust validation practices. The advice for getting good MMC is to do things without trying to focus on MMC. I find this completely unhelpful. Essentially an admission that as a user you can’t apply robust validation practices to MMC so just pray. We are provided with obfuscated data, have no idea what other users are doing, and are staking using a highly volatile cryptocurrency. You take away my ability to apply robust validation to performance with something like MMC and I’m essentially gambling now. As a user this is not something I am willing to put money behind.
Judge me on my long term performance, add in a bonus for originality if you like but don’t potentially punish me for something I have no mechanism to validate and is ever changing.
Thanks to the Numerai team for all their great work. Hope this data is useful to others.