im not sure if we are on the same page, but I meant using sets of 10 for individual feature discovery units, which is where I was mixed up I think, you actually meant you didn’t bother to include 210 of the features in anything since you have deemed them to correlated to other features etc…
This is not a classical competition! We are paid in NMR, which is nothing more than monkey money if the hedgefund fails.
Sharing ideas is vital to help others, improve the fund performance. This exchange of ideas gives NMR value on the long run…
@minou stated correctly. It’s very unlikely that you would end up with a similar model given the information I shared. But this idea can certainly add up and improve your models’ performance.
Here’s a minimalistic Keras implementation of the model described in the article. With a bit of tuning, it works better than XGBoost but far from the 0.03 corr. I guess the feature selection plays an important role here.
import tensorflow as tf
import numpy as np
class Regressor(tf.keras.layers.Layer):
def __init__(self, dims=[32, 8]):
super(Regressor, self).__init__()
self.dims = dims
for i, d in enumerate(self.dims):
setattr(self, f'dense_{i}', tf.keras.layers.Dense(d))
setattr(self, f'dense_{i+1}', tf.keras.layers.Dense(1))
def call(self, inputs):
x = inputs
for i, _ in enumerate(self.dims):
x = getattr(self, f'dense_{i}')(x)
x = tf.nn.relu(x)
x = getattr(self, f'dense_{i+1}')(x)
x = tf.nn.sigmoid(x)
return x
class FeatureRegressor(Regressor):
def __init__(self, dims=[32, 8], latent_idx=1):
super(FeatureRegressor, self).__init__(dims)
self.latent_idx = latent_idx
def call(self, inputs):
x = inputs
for i, _ in enumerate(self.dims):
x = getattr(self, f'dense_{i}')(x)
if i == self.latent_idx:
latent = x
x = tf.nn.relu(x)
return latent, getattr(self, f'dense_{i+1}')(x)
class Model(tf.keras.Model):
def __init__(self,
input_dims=10,
feature_regressor_dims=[32, 8],
feature_latent_idx=1,
target_regressor_dims=[32, 8]):
super(Model, self).__init__()
self.input_dims = input_dims
self.feature_regressor_dims = feature_regressor_dims
self.target_regressor_dims = target_regressor_dims
for i in range(input_dims):
setattr(self, f'feature_regressor_{i}', FeatureRegressor(feature_regressor_dims, feature_latent_idx))
self.target_regressor = Regressor(target_regressor_dims)
def call(self, inputs):
# Perform feature regressor inference
features_latens = []
features_preds = []
for f in range(self.input_dims):
# Prepare input without target feature
mask = np.array([d != f for d in range(self.input_dims)])
input_feature = tf.boolean_mask(inputs, mask, axis=1)
# Regress target feature
feature_latent, feature_pred = getattr(self, f'feature_regressor_{f}')(input_feature)
features_latens.append(feature_latent)
features_preds.append(feature_pred)
# Perform target regressor inference
features_latens = tf.concat(features_latens, axis=-1)
input_target = tf.concat([inputs, features_latens], axis=-1)
target_pred = self.target_regressor(input_target)
# Concat predictions
output = tf.concat(features_preds + [target_pred], axis=-1)
return output
Hi, I haven’t tried your code, but I noticed that you left all the BatchNorm and Dropout layers from the original! You can reach 0.03 with this model, if you follow, what’s in the article.
Hi @nyuton, when you tuned your model, did you do folded cross validation or just trained with the whole training dataset using validation for early stopping?
Thanks @nyuton ! Sure, this just exemplifies how to easily implement the simultaneous training of the feature regressors and the targets regressor, which I think it’s the key part. But of course, neural nets design and train is a subtle art.
Just trained with the trainin set. Normally I do cross validation, but this model takes too long to train…
Thanks for sharing. It’s a neat way of summarizing the extensive concept code.
However if I am not mistaken the tf.concat will lead to a single output whereas the paper tries to optimise multiple outputs. This way it seems also not possible to weight the individual outputs. The paper had the target to weight the loss of the main output (target_pred in your case) by 50%. Achieving this would probably lead to higher correlation.
Thanks @juhuu ! What you noted can be easily handled by the loss. For instance:
def loss(beta):
def f(y_true, y_pred):
target_loss = tf.keras.losses.MSE(y_true[:,-1:], y_pred[:,-1:])
feat_loss = tf.keras.losses.MSE(y_true[:,:-1], y_pred[:,:-1])
return beta * target_loss + (1-beta) * feat_loss
return f
In fact, in my opinion, this way is better since you can use different losses for the targets and for the features. Here I’m using the same though.
Exactly @olivepossum, for instance:
model = Model(...)
model.compile(loss=loss(beta=0.5), optimizer='adam', ..., run_eagerly=True)
model.fit(...)
Hi!
If you liked this post and would like to buy actual good performing models, you can do it now at NumerBay.ai!
Two of my models are available here: https://numerbay.ai/c/numerai-predictions
Nyuton
Mint, what was your highest Val CORR before you used this architecture?