Feature reversing input noise

With such a high val mean and low feature exposure, I would expect MMC to be a lot higher, that’s surprising.

Hi, it’s me! The craziest brazilian newbie ever.

I’m from the ghetho and ghettoboys don’t have enough compute power to run those Michael Oliver’s super cool NNs. But i have imagination and had prepared some adjustment for those folks like me who are taking limited computing power but still wanna get rich… i mean improve the metamodel of course.

What i did was took the Michael Oliver’s idea and consider as a general regularization method than can be perfectally used with a boosted trees algorithm (for example), i also noticed that the conceptual structure fits well with the boosted eras algorithm.

So those are my adaptions:

  • Forget NN’s and special custom loss functions, ghetoboys uses XGBoost and classical Feature Neutralization

  • Before you start with the random reversing thing you can train your XGBoost for some iterations (50, 100 or 200 is enough), as a kind of bootstrap or something

  • And when doing the random reversing training part you can iterate for more than just one time before reversing random features again. That’s why a said that “fits well with the boosted eras algorithm”

Ok, so taking those adjustments i was able to produce this little guy here:

I’ve made a function for doing the reversing thing (in R):

random_slicer <- function(features, slice_percent){

train_slice <- features[,4:313]
feat_list <- sort(sample(1:310,round(310*slice_percent)))

 for(i in length(feat_list)){
    train_slice[,feat_list[i]] <- (-1)*(train_slice[,feat_list[i]] - 0.5) + 0.5
    }    
 train_slice  
}
13 Likes

Very nice! I was hoping someone would try this with XGBoost as well :grinning: Looks like you got it working pretty nicely and have a great Feature Neutral Mean score, congrats!

1 Like

Thank you Michael. Hope contribute more with the community by the next months and years, my quant skills became extremely more professionals after i’ve joined the tournament as an active community’s member. So i still have a lot to retribute!

I’ll work on something new related whith this method, gonna share if succed.

For now i have to provide the Diagnostics A/B test for i’ve done in the results of my previous reply. By the left is the model without the technique and on the rigth the model with the technique. Both with the same xgb parameters, total iterations and 100% FN.

Regards
Eric Reis

2 Likes

Hi MDO,

Thanks for the interesting post. Regarding the feature and target centering when using NN shouldn’t this step be unnecessary if the NN layers have biases?

Thanks

It generally helps convergence since you don’t have to move biases as much and in this case you really want them to be centered if you’re multiplied by -1.

1 Like

This last statement almost sounds like a commercial from Arby’s :cut_of_meat: :cowboy_hat_face:

3 Likes

is this still running in NMRO? rd 245 resolution didn’t look too good.

Judging anything based on one round is a bad idea. The recent rounds have also been especially weird.

4 Likes

These are great insights!

I’ve been working with XGBoost and sklearn a bit on numerai data, but I’m new to PyTorch.

Any pointers on how would I go about creating the mini batches from the eras as you suggest?

Thank you for the tips!
What is the intuition behind eras as mini-batches? Wouldn’t we want each step to be in the direction of lower loss values in more eras, as opposed to one era? I’ve been using quite large batches on shuffled data and it seemed to result in better risk metrics performance over non-shuffled data (which is different than eras as mini-batches but similar)

3 Likes

Can anyone point to some code with pytorch where we use the eras as mini batches? Been cracking my head about this for 1 week now.

eras = df.era.unique()
np.random.shuffle(eras)
for era in eras:
   dfs = df[df.era == era]
   x = torch.from_numpy(dfs[features].values).float()
   y = torch.from_numpy(dfs.target.values).float()
7 Likes

Can a kind soul share the code that generates the mini batches by iterating through eras for a Keras model?

class DataSequence(tf.keras.utils.Sequence):

    def __init__(self, df, features, erasPerBatch=1, shuffle=True):
        self.df = df
        self.features = features
        self.shuffle = shuffle
        self.eras = df.era.unique()
        
        if self.shuffle == True:
            np.random.shuffle(self.eras)
            
        self.erasPerBatch = erasPerBatch
        
        self.df['target_aux'] = self.df[target]
        
  
    def __len__(self):
        return len(self.eras) // self.erasPerBatch

    def on_epoch_end(self):
        if self.shuffle == True:
            self.df = self.df.sample(frac=1).reset_index(drop=True)
            np.random.shuffle(self.eras)


    def __getitem__(self, idx):

        myEras = []
       
        for i in range(self.erasPerBatch):
            myEras.append( self.eras[idx*self.erasPerBatch+i] )
        
        #print(myEras)
                          
        X = self.df.loc[self.df.era.isin(myEras), self.features].values
        y = self.df.loc[self.df.era.isin(myEras), self.features + ['target_aux', 'target']].values
        
        X = np.split(X, X.shape[1], axis=1)
        y = np.split(y, y.shape[1], axis=1)
        
      
        return X, y
4 Likes

you say you use residual layers, which I would suppose need some temporal dimension. Do you use eras as temporal information? If so, what is your reasoning about having no information about the era for the live data? Is there another way to implement rnns without temporal dimension (which would seem weird to me), or is there a heuristic how you infer the live era?

Hi @mdo when you mention nets with residual connections you mean to skip connections? Would this forward function represent it or you are referring to more complex nets with Blocks and Bottlenecks?

  def forward(self, x):
      x = self.linear0(x)
      x1 = x 
      x = F.relu(x)
      x = self.dropout(x)
      x = F.relu(self.linear1(x))
      x = self.dropout(x)
      x = F.relu(self.linear2(x))
      x = self.dropout(x)
      x = F.relu(self.linear3(x))
      x = self.dropout(x)
      x = torch.add(x, x1)
      x = F.relu(self.linear4(x))
      x = self.sigmoid(x)
      return x

Thanks!

@olivepossum I would look to use torch.cat or max pooling instead of torch.add. Cat is more versatile to a bunch of output dimensions.

@olivepossum @greenprophet add and cat would do different things, add is what the residual networks typically use. I usually just do it as x = x + x1
Also you have two nonlinearities back-to-back at the end there, a relu followed by a sigmoid, which I’m guessing is probably not what you want, unless for some reason you want your outputs to be between 0.5 and 1

1 Like

@mdo thanks for the clarification and to point to the two back-to-back nonlinearities, as you mentioned it’s not what I wanted.

Thanks!