Performance Stationarity

You want the performance of your model to as much as possible be a stationary process. A model that goes up for 9 months in a row but then down for all of the last 3 months is less preferable than a model which has the 3 down months interspersed evenly throughout the year. These two models could have the same Sharpe ratio but the one with three down months in a row would have higher drawdown. A sophisticated investor would much prefer to see a model with a stationary track record because they tend to be more robust and tend to be more likely to continue to work into the future.

When I say stationary I tend to mean that the performance of your model is statistically similar to flipping a biased coin. Let’s say your model does well in 80% of eras, then your performance should look like flipping a coin with 80% bias on heads. Your performance should look like something like HHHHTHHHTHHHTHHHHT not something like this TTTHHHHHHHHHTTTTTHHHHHHHH i.e. it should lack autocorrelation / be memoryless / not have any long burn periods.

The challenge with stock market data is that almost all of the stock features are not stationary but the goal if for the model built with the features to be stationary. Quant features like value or momentum can work well for years and then stop working or work in the opposite direction for the next few years. Models trained on these non-stationary features will tend to also not have stationary performance and this is why so many quant models don’t generalize well out of sample – they have fit to regimes, they have not found stationary signals.

In a previous posts, Michael gave code for neutralizing models to feature exposures. While there’s no guarantee that this creates stationarity in performance out of sample, in tests it tends to help because feature neutralization will reduce to zero any linear bets on the non-stationary factors. MMC2 and Feature Neutralization

I wanted to open up discussion on this topic as it’s unusual in most machine learning contexts to care about stationarity or the ordering of your performance. I think many Numerai users cared about getting the highest possible mean correlation score and then began to care about getting the best possible Sharpe. I think the next frontier will be reaching stationarity.

Does anyone explicitly try to learn a model to optimize for stationarity? How?

Does anyone look at ADF tests on their performance or on the feature’s performance in their model construction? Or remove features with too much autocorrelation in their correlation with the target from era to era?

How can you train a model on the Numerai training data to ensure stationarity at least over the training set i.e. enforce that you don’t have especially long periods of strong performance or underperformance over the training eras? Bonus: does a model with stationarity over the training set work out of sample better than one without? Extra bonus: if you optimize for stationarity in the training of your model is that better than optimizing for Sharpe?

PS In Marcos’ book Advances In Financial Machine Learning you can see a discussion on stationarity in chapter 5
PPS You can bet AQR wished value had more stationary performance


Great post Richard, much appreciated! These issues have been on my mind recently as I’ve been playing around with fitting models to feature neutral targets. I’ve been testing out the Sortino ratio as an alternative to Sharpe for doing hyperparameter selection, because it makes sense to me to only penalize downside volatility/variance. Interestingly I’m finding that Sortino does favor different and narrower ranges of hyperparameters than Sharpe.

def sortino_ratio(x, target=.02):
    xt = x - target
    return np.mean(xt) / (np.sum(np.minimum(0, xt)**2)/(len(xt)-1))**.5

After reading your post and doing some internet searching I came across this document which proposes a modification to Sharpe, they call Smart Sharpe, which takes autocorrelation into account. If anyone is interested I threw together a simple implementation to help clarify it to myself. I also created the “Smart” version of Sortino by including the autocorrelation penalty term to perhaps get the best of both worlds.

def ar1(x):
    return np.corrcoef(x[:-1], x[1:])[0,1]

def autocorr_penalty(x):
    n = len(x)
    p = ar1(x)
    return np.sqrt(1 + 2*np.sum([((n - i)/n)*p**i for i in range(1,n)]))

def smart_sharpe(x):
    return np.mean(x)/(np.std(x, ddof=1)*autocorr_penalty(x))

def smart_sortino_ratio(x, target=.02):
    xt = x - target
    return np.mean(xt)/(((np.sum(np.minimum(0, xt)**2)/(len(xt)-1))**.5)*autocorr_penalty(x))

Amazing response! This is why we have a forum! I hadn’t heard of Smart Sharpe but that paper makes a lot of sense. Maybe we should use use code and switch to showing Smart Sharpe over validation when uploading predictions. @master_key

1 Like

played a little for a few rounds with suggested smart version of sharpe and sortino.
not realized at first from the math that suggested autocorr_penalty(x) in favour of negative auto correlation. Negative autocorrelation means era correlation jumping up and down around mean each next era. Correct me if I’m wrong but desired property of stationary is to have AR1 close to 0, not to -1.
I’m trying loss function that have an inverse value of original function autocorr_penalty() in case of negative autocorrelation:

# In R style 
autocorr_penalty2 <- function(x) {
  ap <- autocorr_penalty(x)
  if(ap < 1)  { 
	return (1/(ap) ) #  ap == 0 when  AR1(x) == -1 
  } else {
   return (ap)

Use the absolute value of the autocorrelation as the penalty…they both represent certainty, which is what the measure is trying to penalize.


Yeah, I had wondered about that too and after thinking more I think you’re right. I’m guessing the paper didn’t address this because negative AR1 coefficients just don’t happen in the long time-series data they are analyzing. To prevent wonkiness when using as a penalty I agree with @of_s that you should just modify the function to:

def autocorr_penalty(x):
    n = len(x)
    p = np.abs(ar1(x))
    return np.sqrt(1 + 2*np.sum([((n - i)/n)*p**i for i in range(1,n)]))

If interested, I had written about this years ago.


are you a Finance PhD?

Not technically, but I have several areas of research I could successfully defend for one! :slight_smile: