Conserved Currents in Finance
Numerati (and others) are convinced that its possible to estimate the future of the stock market. The proof of that statement, which is just a proof about the sentiment of data scientists in the present epoch in the evolution of humankind and not about the nature of reality, is that there are hundreds of data scientists staking their cash in the Numer.ai tournament.
Predictability is only possible if something that happens now, something that we know, continues to persist for some amount of time into the future. Physicists describe this situation by saying that there is a conserved current. This current is physical. By the word physical we mean everything that occurs in nature. For example, peoples opinions are physical in the sense that they are real and they persist for some time into the future.
In the following I am going to write down an elementary proof about the nature of that current. This is really about the way we must calculate it, assuming that it exists. But I have argued that everybody agrees that it does.
I will have to use a little bit of math for this proof. But it’s so little that by now I think everybody can follow along.
Set Up
Normally we train a model H to try to estimate y. One of the reasons that we use black box estimators is that we know that human biases are generally inferior to unemotional machine learning estimators. But instead of training any model, let’s start with the best minimum biased estimator, a linear model (OLS) and in the boosting sense, go from there:
H = A \cdot X + \ldots \tag{1}
We also know that we can reduce the variance of our estimate by introducing some bias into our estimate of y. There are many ways to express this situation, but now we want to express this in a way that makes our linear model and its associated risk explicit. Doing so will lead to some interesting properties of the equations. Therefore, we say that we start with a minimum biased model and add to it another model, say a neural network M(\cdot), that compensates for that risk. In more data sciency terms the neural network applies the biases that reduce variances of the linear model:
H = A \cdot X+ M(\cdot) \tag{2}
Here the center dot in the argument of M explicitly indicates that we don’t know what that argument consists of in a world where the choices are the features X and their first derivatives \dot{X}.
Definitions
We call H the Hamiltonian of a real physical dynamical financial system. Our model of it is the predictor we are after.
The linear predictor A\cdot X with features X and coefficients A is associated with the potential for gain or loss in such a real physical dynamical system. In classical mechanical terms we would call it the potential energy.
The Lagrangian of this dynamical system is:
\mathcal{L} = M(\cdot) - A\cdot X \tag{3}
The equation that describes the persistence of conserved currents into the future is called the Euler-Lagrange equation:
0 = \frac{d}{dt} \frac{\partial \mathcal{L} }{\partial \dot{X} } - \frac{\partial \mathcal{L} }{ \partial X} \tag{4}
The universe of possible variables consist any kind of features X and their duals, the first derivative of the features, \dot{X}.
Theorem
In summary, a biasing neural network cannot only be a function of the features in real physical systems with conserved currents.
If A\cdot X is the minimum biased estimator of a target y associated with a dynamical potential of the above Hamiltonian that conserves a Noetherian current into the future, then no matter what the variables are for X, a compensating neural network cannot only be a function of X.
Proof
As usual the proof begins by assuming the opposite, namely that the biasing neural network is only a function of X. Then the Langrangian is obtained by replacing the dot in Eq. 3 with X:
\mathcal{L} = M(X) - A \cdot X \tag{5}
Plugging this into the Euler-Lagrange Equation (Eq. 4) and rearranging, we trivially obtain that the derivative of the biasing neural network with respect to the features X is identically the coefficients of the linear model:
A = \frac{dM}{dX} \tag{6}
In other words the neural network tries to predict the linear model. Clearly the linear model and therefore the neural network fails to bias the estimator H.
\square
Corollary
A neural network that correctly persists a compensating bias for a linear model must be a function of the derivatives of the features.
Proof
The proof follows from considering the math that led to the Theorem: Only expressions that are functions of the first derivative lead to a non-zero first term in the Euler-Lagrange equation (Eq.~4). But any function that does so leads to anything other than the linear model. Since any model that is not the linear model is more biased than OLS, then biasing can persist according to real causal physical laws.
\square
Notes
The above theorem does not state that we cannot bias our estimate of y with only X available. For better or worse, we, obviously, can always do that.
The statement about conserved currents relates directly to the question of generalizability in data science; do my biases help at the appearance of previously unseen data? The assertion is that if there exists some real dynamical quantity that persists when really new conditions arise (something real out in the world), then a compensating neural network cannot be a function of X only in order to be able to model that persisting dynamical quantity and therefore to correctly bias the linear model into the future.
The Theorem does not say we cannot get improved estimates of y by adding all kinds of new features to the linear predictor but only that risk minimizing bias cannot correctly be propagated into the future using only a bias function of the features in X even if those features consist of derivatives of stationary features themselves!
Stop pulling your hair out. When in doubt identify the derivative.