You want the performance of your model to as much as possible be a stationary process. A model that goes up for 9 months in a row but then down for all of the last 3 months is less preferable than a model which has the 3 down months interspersed evenly throughout the year. These two models could have the same Sharpe ratio but the one with three down months in a row would have higher drawdown. A sophisticated investor would much prefer to see a model with a stationary track record because they tend to be more robust and tend to be more likely to continue to work into the future.
When I say stationary I tend to mean that the performance of your model is statistically similar to flipping a biased coin. Let’s say your model does well in 80% of eras, then your performance should look like flipping a coin with 80% bias on heads. Your performance should look like something like HHHHTHHHTHHHTHHHHT not something like this TTTHHHHHHHHHTTTTTHHHHHHHH i.e. it should lack autocorrelation / be memoryless / not have any long burn periods.
The challenge with stock market data is that almost all of the stock features are not stationary but the goal if for the model built with the features to be stationary. Quant features like value or momentum can work well for years and then stop working or work in the opposite direction for the next few years. Models trained on these non-stationary features will tend to also not have stationary performance and this is why so many quant models don’t generalize well out of sample – they have fit to regimes, they have not found stationary signals.
In a previous posts, Michael gave code for neutralizing models to feature exposures. While there’s no guarantee that this creates stationarity in performance out of sample, in tests it tends to help because feature neutralization will reduce to zero any linear bets on the non-stationary factors. MMC2 and Feature Neutralization
I wanted to open up discussion on this topic as it’s unusual in most machine learning contexts to care about stationarity or the ordering of your performance. I think many Numerai users cared about getting the highest possible mean correlation score and then began to care about getting the best possible Sharpe. I think the next frontier will be reaching stationarity.
Does anyone explicitly try to learn a model to optimize for stationarity? How?
Does anyone look at ADF tests on their performance or on the feature’s performance in their model construction? Or remove features with too much autocorrelation in their correlation with the target from era to era?
How can you train a model on the Numerai training data to ensure stationarity at least over the training set i.e. enforce that you don’t have especially long periods of strong performance or underperformance over the training eras? Bonus: does a model with stationarity over the training set work out of sample better than one without? Extra bonus: if you optimize for stationarity in the training of your model is that better than optimizing for Sharpe?
PS In Marcos’ book Advances In Financial Machine Learning you can see a discussion on stationarity in chapter 5
PPS You can bet AQR wished value had more stationary performance https://www.aqr.com/Insights/Perspectives/Its-Time-for-a-Venial-Value-Timing-Sin