Why feature values are binned?

ryo_matsuzaka · September 19, 2022, 3:39am

Hello. I am a newbie of Numerai.
I started yesterday.
I have a question about training data.
Why feature values are binned like 0, 0.25, 0.5, 0.75 and 1.0?
What these values do mean?
These data are already feature engineered?

bor1 · September 19, 2022, 1:53pm

The main reason the features are as they are is that numer.ai isn’t allowed to give out the financial datasets that they are subscribed to.

By hiding the name of the stock, and by anonymizing the features, they can still share the data with us.

The plus side is that you have little (if any!) data cleaning to do. All features are ~ normal distributions centered at 0.5. So maybe you want to substract 0.5 from them when you feed them to a neural network, but that is about it.

wigglemuse · September 19, 2022, 2:01pm

That’s the main reason they are obfuscated/normalized, but the binning itself I think they just did because they thought it worked better… or something. (I forget exactly what they said about this when asked, but it was more like “we considered that, but didn’t do it for whatever reason”, not “because secrecy”.) It does potentially lower the computing resource level needed to deal with the data as it can be cast to integers, and likely doesn’t affect results (in most cases) as much as one might think. Still, I wouldn’t be surprised to see them go to true real-valued features at some point.

Topic		Replies	Views
A few simple newb questions Tournament	3	834	December 21, 2021
[Noob question] One more question about time series Tournament	3	1013	April 2, 2021
Creating features from currency-exposed metrics Signals	1	702	April 10, 2023
Tournament Targets and Target Types Tournament	2	1302	March 7, 2021
How to Safely Perform Feature Neutralization Data Science	3	2701	October 3, 2020

Why feature values are binned?

Related topics