With the advent of TC many users may wonder how to optimize for metrics beyond correlation and mean-squared error. Here we will show how to directly optimize for metrics like TB200 and FNC. This is intended to be a proof of concept and source of inspiration, not a set of instructions.
A previous forum post demonstrated how to optimize for Spearman correlation directly. This work can be extended fairly simply to allow for the optimization of a top/bottom correlation (e.g. TB200) where only the most extreme values of the prediction are used in the correlation function.
import torch
import pandas as pd
import numpy as np
import torchsort
from torch.distributions import Normal
from torch.functional import F
import torch.optim as optim
from torch import nn
normal = Normal(0,1)
def numerair_tb(pred, target, tb=None, gaussianize=False, regularization_strength=.0001):
# Computes and returns a differentiable Numerai score with option to use only
# the top and bottom tb values. Use the gaussianize option to perform Gauss-rank
# instead of just rank transform on predictions
pred = pred.reshape(1, -1)
target = target.reshape(1, -1)
# get sorted indicies
rr = torchsort.soft_rank(pred, regularization_strength=regularization_strength)
# change pred to uniform distribution
pred = (rr - .5)/rr.shape[1]
# convert uniform to gaussian distribution
if gaussianize:
pred = normal.icdf(pred)
# select top/bottom indices
if tb is not None:
tbidx = torch.bitwise_xor(rr<=tb, rr > (rr.shape[1]-tb))
pred = pred[tbidx]
target = target[tbidx]
# Pearson correlation
pred = pred - pred.mean()
pred = pred / pred.norm()
target = target - target.mean()
target = target / target.norm()
return (pred * target).sum()
If we want to control feature exposure of the top/bottom part of the signal, it can be helpful to have the correlation function return this exposure as well so it can be incorporated into the overall cost function. A modified version of the above to return the total feature exposure:
import torch
import pandas as pd
import numpy as np
import torchsort
from torch.distributions import Normal
from torch.functional import F
import torch.optim as optim
from torch import nn
normal = Normal(0,1)
def numerai_r_tb_exposure(pred, target, features, tb=None, gaussianize=False, regularization_strength=.0001):
# Computes and returns a Numerai score and feature exposure
pred = pred.reshape(1, -1)
target = target.reshape(1, -1)
# get sorted indicies
rr = torchsort.soft_rank(pred, regularization_strength=regularization_strength)
# change pred to uniform distribution
pred = (rr - .5)/rr.shape[1]
# convert uniform to gaussian distribution
if gaussianize:
pred = normal.icdf(pred)
# select top/bottom indicies
if tb is not None:
tbidx = torch.bitwise_xor(rr<=tb, rr > (rr.shape[1]-tb))
pred = pred[tbidx]
target = target[tbidx]
features = features[tbidx[0]]
# Pearson correlation
pred = pred - pred.mean()
pred = pred / pred.norm()
target = target - target.mean()
target = target / target.norm()
return (pred * target).sum(), ((pred @ features)**2).sum()
We can use the above cost functions to compute CORR and TB scores as well as feature penalty terms. The inclusion of a differentiable version of the psudoinverse in Pytorch, means we can feature-neutralize a modelâs predictions and directly optimize for FNC as well. Now we will show how to train a simple neural network on a cost function optimizing for FNC, FNC TB500, CORR, while penalizing feature exposure in the prediction and the top/bottom 500 of the neutralized prediction. (Weâve found TB500 a bit more stable to use for optimization as TB200 tends to overfit easily.) We initialize a simple neural network like:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.lin1 = nn.Linear(1050, 100)
self.lin2 = nn.Linear(100, 30)
self.lin3 = nn.Linear(30, 1)
self.bn = nn.BatchNorm1d(1)
self.do1 = nn.Dropout(0.5)
self.do2 = nn.Dropout(0.5)
def forward(self, x):
x = self.lin1(x)
x = self.do1(F.mish(x))
x = self.lin2(x)
x = self.do2(F.mish(x))
output = self.bn(self.lin3(x))
return output
We can then set up a training loop as follows to optimize for this multi-part cost function.
for epoch in range(epochs):
np.random.shuffle(era_list)
for ii, era in enumerate(era_list):
# get features and target from data and put in tensors
features = torch.tensor(training_data[training_data.era == era].filter(like='feature').values) - .5
target = torch.tensor(training_data[training_data.era == era]['target'])
# zero gradient buffer and get model output
optimizer.zero_grad()
model.train()
output = model(features)
# neutralize model output
b = features.pinverse(rcond=1e-6) @ output
linear_pred = features @ b
neutralized_output = output - linear_pred
neut_tb_loss, neut_tb_exp = numerai_r_tb_exposure(neutralized_output, target, features, tb=500)
neut_loss = numerair_tb(neutralized_output, target)
orig_loss, orig_exp = numerai_r_tb_exposure(output, target, features)
# loss = -tb500 corr for neutralized output - corr for neutralized output - corr + tb500 exposure + exposure
loss = -neut_tb_loss - neut_loss - orig_loss \
+ neut_tb_exp/1e3 + orig_exp/1e4
loss.backward()
optimizer.step()
Weâve trained a model using this code and have submitted it here. The validation statistics for this model are here. Again this is far from optimized and is meant only to show what is possible, but it seems fairly decent already. Cheers and good luck!