NN architecture for >0.03 CORR on validation set

bensch · May 24, 2021, 12:06pm

im not sure if we are on the same page, but I meant using sets of 10 for individual feature discovery units, which is where I was mixed up I think, you actually meant you didn’t bother to include 210 of the features in anything since you have deemed them to correlated to other features etc…

nyuton · May 24, 2021, 2:40pm

This is not a classical competition! We are paid in NMR, which is nothing more than monkey money if the hedgefund fails.

Sharing ideas is vital to help others, improve the fund performance. This exchange of ideas gives NMR value on the long run…

@minou stated correctly. It’s very unlikely that you would end up with a similar model given the information I shared. But this idea can certainly add up and improve your models’ performance.

edu · May 30, 2021, 12:06pm

Here’s a minimalistic Keras implementation of the model described in the article. With a bit of tuning, it works better than XGBoost but far from the 0.03 corr. I guess the feature selection plays an important role here.

import tensorflow as tf
import numpy as np

class Regressor(tf.keras.layers.Layer):

    def __init__(self, dims=[32, 8]):
        super(Regressor, self).__init__()

        self.dims = dims
        for i, d in enumerate(self.dims):
            setattr(self, f'dense_{i}', tf.keras.layers.Dense(d))
        setattr(self, f'dense_{i+1}', tf.keras.layers.Dense(1))

    def call(self, inputs):

        x = inputs
        for i, _ in enumerate(self.dims):
            x = getattr(self, f'dense_{i}')(x)
            x = tf.nn.relu(x)
        x = getattr(self, f'dense_{i+1}')(x)
        x = tf.nn.sigmoid(x)

        return x


class FeatureRegressor(Regressor):

    def __init__(self, dims=[32, 8], latent_idx=1):
        super(FeatureRegressor, self).__init__(dims)
        self.latent_idx = latent_idx

    def call(self, inputs):

        x = inputs
        for i, _ in enumerate(self.dims):
            x = getattr(self, f'dense_{i}')(x)
            if i == self.latent_idx:
                latent = x
            x = tf.nn.relu(x)

        return latent, getattr(self, f'dense_{i+1}')(x)


class Model(tf.keras.Model):

    def __init__(self,
        input_dims=10,
        feature_regressor_dims=[32, 8],
        feature_latent_idx=1,
        target_regressor_dims=[32, 8]):
        super(Model, self).__init__()

        self.input_dims = input_dims
        self.feature_regressor_dims = feature_regressor_dims
        self.target_regressor_dims = target_regressor_dims

        for i in range(input_dims):
            setattr(self, f'feature_regressor_{i}', FeatureRegressor(feature_regressor_dims, feature_latent_idx))

        self.target_regressor = Regressor(target_regressor_dims)

    def call(self, inputs):

        # Perform feature regressor inference
        features_latens = []
        features_preds = []
        for f in range(self.input_dims):
            # Prepare input without target feature
            mask = np.array([d != f for d in range(self.input_dims)])
            input_feature = tf.boolean_mask(inputs, mask, axis=1)
            # Regress target feature
            feature_latent, feature_pred = getattr(self, f'feature_regressor_{f}')(input_feature)
            features_latens.append(feature_latent)
            features_preds.append(feature_pred)

        # Perform target regressor inference
        features_latens = tf.concat(features_latens, axis=-1)
        input_target = tf.concat([inputs, features_latens], axis=-1)
        target_pred = self.target_regressor(input_target)

        # Concat predictions
        output = tf.concat(features_preds + [target_pred], axis=-1)

        return output

nyuton · May 31, 2021, 5:09pm

Hi, I haven’t tried your code, but I noticed that you left all the BatchNorm and Dropout layers from the original! You can reach 0.03 with this model, if you follow, what’s in the article.

olivepossum · May 31, 2021, 5:35pm

Hi @nyuton, when you tuned your model, did you do folded cross validation or just trained with the whole training dataset using validation for early stopping?

edu · May 31, 2021, 7:39pm

Thanks @nyuton ! Sure, this just exemplifies how to easily implement the simultaneous training of the feature regressors and the targets regressor, which I think it’s the key part. But of course, neural nets design and train is a subtle art.

nyuton · June 1, 2021, 7:46am

Just trained with the trainin set. Normally I do cross validation, but this model takes too long to train…

juhuu · June 1, 2021, 11:25am

Thanks for sharing. It’s a neat way of summarizing the extensive concept code.

However if I am not mistaken the tf.concat will lead to a single output whereas the paper tries to optimise multiple outputs. This way it seems also not possible to weight the individual outputs. The paper had the target to weight the loss of the main output (target_pred in your case) by 50%. Achieving this would probably lead to higher correlation.

edu · June 1, 2021, 12:10pm

Thanks @juhuu ! What you noted can be easily handled by the loss. For instance:

def loss(beta):
    def f(y_true, y_pred):
        target_loss = tf.keras.losses.MSE(y_true[:,-1:], y_pred[:,-1:])
        feat_loss = tf.keras.losses.MSE(y_true[:,:-1], y_pred[:,:-1])
        return beta * target_loss + (1-beta) * feat_loss
    return f

In fact, in my opinion, this way is better since you can use different losses for the targets and for the features. Here I’m using the same though.

olivepossum · June 1, 2021, 2:36pm

Hi @edu this loss(beta) function is the one you would call in the Model(tf.keras.Model) model?

edu · June 1, 2021, 6:38pm

Exactly @olivepossum, for instance:

model = Model(...)
model.compile(loss=loss(beta=0.5), optimizer='adam', ..., run_eagerly=True)
model.fit(...)

nyuton · August 20, 2021, 9:56am

Hi!

If you liked this post and would like to buy actual good performing models, you can do it now at NumerBay.ai!
Two of my models are available here: https://numerbay.ai/c/numerai-predictions

Nyuton

johnnywhippet · August 26, 2021, 5:22pm

Mint, what was your highest Val CORR before you used this architecture?