Liz Experiment Review Q1 2021 : Generating Features and Applying Feature Neutralization

liz · April 22, 2021, 4:58am

Introduction

Greetings, all!

This post marks the end of my first experiment period on numer.ai. As the title suggests, I’m aiming to do something like this approximately quarterly. This time was more exploration than experiment. I’ve talked on Daily Scores and Chill about my approach to modeling and thought that might be helpful to put in writing, given I have had some success with MMC. Following that, I will discuss the experiment itself. Spoiler alert : I made some mistakes, interpret at your own peril! Feel free to reach out to me on rocketchat, where my username is ‘aelizzybeth’

Generating features

I believe that taking creative approaches to modeling will help generate that sweet, sweet MMC. That certainly isn’t all that is required, but I think there is a lot of MMC opportunity there.

When I started competing on numerai about 8 months ago, after surveying some posts and rocketchat, I felt not much attention was being paid to feature engineering by modelers in the classic tournament. Of course, many standard models do some feature engineering on their own. Still, I felt there is ample room to find signal here, so I tapped into my generative art roots and designed a generative approach. I tested random forest, gbm, different kinds of transformations, as well as including or excluding the original variables. Unfortunately I didn’t take good notes at the time. The resulting process is summarized below (I implemented this in R, fwiw).

set a seed for reproducing portions of process involving randomization
transform the features of the dataset so there are no zeroes. (didn’t bother with the target)
generate 1000 unique pairs of indices 1->310 (to correspond to features)
for each generated pair, (A,B), derive a new column = Logarithm base B of A
train a gbm on the new dataset (original features + 1000 engineered features)

Then, when predicting, transform new data, predict, that’s it.

liz up to and including round 243 shows early iterations of this development, into the final version in the last few rounds. urza shows results from round 244 til now (R256 just finished today).

Feature Neutralization Experiment

From R244 - R256 I carried out a 13-round experiment. I wanted to see what various degrees of feature neutralization applied to the model described above would result in, performance-wise. I used a method adapted from a post by wigglemuse.

(text about misapplying FN deleted. Upon another code review, realized I did it right, sorry for the confusion). For reference, I stake Corr + 2x MMC on all models.

Urza : 0 Feature Neutralization
Liz : 25% Feature Neutralization
Yawgmoth : 50% Feature Neutralization
Emrakul : 75% Feature Neutralization
Ulamog : 100% Feature Neutralization

See some box-and-whisker plots of Corr, MMC, and FNC.

Please note, the y-axis differs for each plot.

corr and mmc suffered the more FN, generally, during this 13-round period. I found the FNC results most interesting. Though I don’t present them here, it was interesting to see the rounds were Urza wasn’t the top performer, the order of corr performance tended to be flipped. I’ll paste the data into a comment if anyone is interested (though this can be found on the links provided as well). I would have uploaded the excel spreadsheet I used to write this but now I am tired and not sure how. I’m glad to take questions other than “what are your model parameters?”.

Thoughts for next experiment period
I might get this going right away, might take a few weeks, but I want to try a few things.

less overfitting
xgboost (gonna skip this on next experiment cycle, slots are mapped out already)
more generated features
different sets of generated features
automated feature selection (forgot to add at first)
FN relative to original data only (forgot to add at first) [EDIT : I actually did it right the first time, but I am now interested so will experiment with full-FN vs original-FN as part of my next suite of experimental models]

liz · April 22, 2021, 5:01am

here is the data I promised

Corr

ROUND	Urza	Liz	Yawgmoth	Emrakul	Ulamog

256	0.0831	0.0796	0.0715	0.054	0.0243
255	0.0928	0.0882	0.078	0.0554	0.0153
254	0.1211	0.1138	0.0991	0.0662	0.0101
253	0.0475	0.0471	0.0447	0.0361	0.0172
252	0.0372	0.0339	0.0281	0.0167	-0.0029
251	0.0448	0.0411	0.0352	0.025	0.007
250	0.027	0.0245	0.0198	0.0102	-0.0083
249	0.0086	0.0092	0.0091	0.0059	-0.0006
248	0.01	0.0113	0.0138	0.016	0.0124
247	0.0447	0.0452	0.0449	0.0398	0.023
246	0.0495	0.0466	0.0415	0.0316	0.0139
245	0.0543	0.049	0.0407	0.0255	0.0016
244	0.057	0.0515	0.0425	0.0262	-0.0015

MMC

ROUND	Urza	Liz	Yawgmoth	Emrakul	Ulamog

256	0.0098	0.0083	0.0056	0.0011	-0.0047
255	0.0115	0.0092	0.0051	-0.0026	-0.015
254	0.0314	0.0268	0.0189	0.0031	-0.0212
253	0.0112	0.0114	0.011	0.0084	0.0019
252	0.0184	0.0159	0.012	0.0047	-0.0072
251	0.021	0.0183	0.0144	0.0086	-0.001
250	0.0153	0.0134	0.01	0.0035	-0.0091
249	-0.0019	-0.0013	-0.001	-0.0022	-0.0046
248	-0.0055	-0.0043	-0.0018	0.0018	0.003
247	0.0168	0.0172	0.0176	0.016	0.0081
246	0.0194	0.0171	0.0136	0.008	-0.0004
245	0.0285	0.0241	0.0179	0.0073	-0.0076
244	0.0257	0.0213	0.0147	0.004	-0.0124

FNC

ROUND	Urza	Liz	Yawgmoth	Emrakul	Ulamog

256	0.0184	0.0195	0.0208	0.0208	0.0187
255	0.0199	0.0205	0.0203	0.0169	0.0104
254	0.0136	0.0155	0.0171	0.0141	0.0066
253	0.0118	0.0131	0.014	0.013	0.0116
252	-0.0085	-0.007	-0.0047	-0.0047	-0.0065
251	0.0065	0.0056	0.0052	0.0048	0.0055
250	-0.0091	-0.0088	-0.008	-0.0072	-0.0079
249	-0.0022	-0.0027	-0.0025	-0.0017	-0.0012
248	0.0029	0.0048	0.0078	0.0109	0.0115
247	0.0201	0.0227	0.0246	0.0241	0.0216
246	0.0108	0.011	0.011	0.0106	0.0114
245	-0.0024	-0.0005	0.0008	0.0002	0.0009
244	-0.0069	-0.0049	-0.0026	-0.0021	-0.0028

jimmy_woodford · April 22, 2021, 7:16am

Thanks for sharing, @liz! I have two questions about your process:

Regarding step 3, how did you select these feature pairs? Just at random or based on some selection process?

And is the logarithmic approach in step 4 based on theory or some common method?

liz · April 22, 2021, 11:20am

I select the pairs at random, and the logarithmic approach is just something I came up with that I thought might randomly expose more signal. I think that’s loosely based on principles of feature engineering but I’ve never read about it being done this way.

bigbertha · April 22, 2021, 5:13pm

Perhaps is an idea to pay attention to the feature groups. Either combine inside the groups, or outside the groups?

liz · April 24, 2021, 10:51pm

could be! currently, developing a selection scheme for candidate features to transform is on my mental list of things I want to do, but won’t be doing right away

liz · April 24, 2021, 11:33pm

Edited title from “Liz Experiment Review Q1 2021 : Generating Features and Misapplying Feature Neutralization” to “Liz Experiment Review Q1 2021 : Generating Features and Applying Feature Neutralization” and some text in the post because I realized I didn’t make an error in the first place.

In any case, I’ll be testing full-FN vs. original-FN as part of my ongoing tests. Sorry for the confusion!

paulito · April 26, 2021, 10:14am

I guess a lot of success comes from the expantion of feature space alone (~1300 features). What"s your compute environment? I am already struggling with ~600 Features (I use ~25 GB ram), especially when it comes to inference on tournament data. I am looking for ways to get around it, but haven’t found good solutions yet.

senadorancap · April 26, 2021, 12:03pm

I’m running a 16GB ram environment and I’m kinda using something similar to Liz approach. My solution to incorparate more features was “sample and ensemble”. Basically you run several models with sampled features and after you ensemble all of them.

paulito · April 26, 2021, 12:19pm

Yes. I was also expeirmenting with something like that. If you don’t mind asking, what kind of model do you use for ensembling and how many models do you ensemble? In my experience ensembling just very few models with a very strikt ensemble model (eg. linear regression) does not help improving scores very much. But iI guess this is the way to go if feature space is getting too big.

liz · April 26, 2021, 3:44pm

i’m using an r5.8xlarge ec2 instance on aws which i think is 256 gb ram and 32 cpu cores, i also have 300 gb ssd and instruct the program to use ssd when it runs out of ram. i previously ran these train jobs on a 16gb machine at home and they would take 24 hours and fail 25% of the time. on the r5.8xlarge it takes about 10 hours maybe. costs me ~25 USD. i don’t train the models every week, much faster when just predicting

backe · April 26, 2021, 3:57pm

Sus FE technique. Interesting FN experiment though!

liz · April 26, 2021, 4:01pm

re: sus… care to expand?

backe · April 26, 2021, 4:24pm

Suspicious… [post must be at least 20 characters]

liz · April 26, 2021, 4:27pm

yes obviously sus = suscpicious. I was trying to invite you to contribute anything constructive

backe · April 26, 2021, 4:45pm

Ohh haha
Well, I just don’t see how your log-features could bring something new (uncorrelated) to the table. Generaly I think it’s very hard to do FE on these kinds of datasets; maybe some features on era level could work!

wigglemuse · April 26, 2021, 4:53pm

Feature expansion just by itself can sometimes work wonders. This is after all the entire basis of algorithms like SVMs that use the so-called “kernel trick” to find easier ways to separate the data in higher-dimensional space than is possible using only the original features. Downside is that it doesn’t scale well…

liz · April 26, 2021, 4:58pm

yeah, so far it’s working nicely for me, more time will strengthen or weaken my faith in this approach. I’d most like to see how ‘urza’ performs through a regime-shift.

backe · April 26, 2021, 5:10pm

But these features can easily be learned by a model. My questions is why would you force your model to learn from a specific set of randomly generated features. Why not e.g. arctan2(A, B) ? I think one should let their model decide - especially when you don’t know what features represent.
Or see it this way, there are ~50k unique feature pair combinations. You randomly select 1k of them. What about the rest? This is a clear path to overfitting imho.

liz · April 26, 2021, 5:19pm

the decision to use log base B of A was somewhat arbitrary. There are many other mappings possible. exponentially more if you get into 3-dimensional and higher!

what about the rest is indeed a question on my mind. There are too many to explore them all, when you also consider all non-trivial mappings. Phase 2 of my experiment, as I described in the post, includes testing similar models with different generated features. As noted, all these models contain the original 310 features as well. I am trying to build toward a more sensible exploration and selection of transformed features in this problem space, and chose this as my starting place. This is an ongoing, iterative experiment, and certainly not representative of a full exploration of the space I’ve described. Of course, all of that is apparent in my original post.

re : overfitting, literally everyone who competes in this tournament on a repeated basis is overfitting in some way. I’d like to avoid it somewhat but avoiding it entirely is not possible.

Topic		Replies	Views
Live Results on FN Data Science	8	1770	December 26, 2020
Numerai Self-Supervised Learning & Data Augmentation Projects Data Science	114	10014	March 22, 2023
Feature neutralization workflow Data Science	6	5483	February 24, 2021
Model Upload Beta! Announcements	9	1635	June 19, 2023
Searching partner Tournament	6	989	April 30, 2021

Liz Experiment Review Q1 2021 : Generating Features and Applying Feature Neutralization

Related topics