Eejits guide to Numer.ai

pheasantstilly · September 1, 2021, 1:11pm

Along my starting journey with of Machine learning was with Kaggle data. but it just wasn’t giving the correct understanding of long term aspirations of building AI that would take over the Universe and destroy the Borg.

Upon hearing of this magical tournament of computer wielding smart arses, where models battle amongst the silicone burning RTX 3090’s. But a retro gaming machine may never compete against these power houses. But fear not because you can use Colabs.

But going through the Numer.ai tutorial and given a basic XGBRegressor model that works surprisingly well. Setup an automated system that allows me to download, run. But I needed time to go through the code fully and understand the full process. Over time the model did ok. but knowing everyone has mostly the same model. I knew I’d have to go back.

So my latest project was to demystify some of this, mainly for myself.
SO without any further creditability.

Here is my early work on GitHub - gnellany/numerai: Stuff I'm working on

If you are using the model that numer.ai supplies and running in python
"
model = XGBRegressor(n_estimators=10000,learning_rate=0.01,subsample=0.3,colsample_bytree=0.1,max_depth=5,
booster=‘gbtree’,tree_method=‘gpu_hist’,predictor=‘gpu_predictor’,
reg_lambda=0.0009,reg_alpha=23,random_state=42)
"
To you new users, remember if using the starting code to delete the example_model.xgb

After building a few different models with this. I decided to automate a little more to find the best variables to use with this code. If you kept reading this far the code you are looking for is called Long_train

The current results with the model trainings can be found:

The future Issue I face is What is the MMC and what is a target for these?

autratec · September 2, 2021, 2:58am

Thanks for sharing the code. MMC definition was well explained in the tournament introduction documents. Basically, it is the performance indicator with your peers.

pheasantstilly · September 19, 2021, 9:52pm

Update to now legacy model code with 97%

model = XGBRegressor(n_estimators=100000, learning_rate=0.0029815, subsample=0.9, colsample_bytree=0.06576536,
max_depth=5,
booster=‘gbtree’, tree_method=‘gpu_hist’, predictor=‘gpu_predictor’,
reg_lambda=0.1, reg_alpha=24, random_state=42)

Topic		Replies	Views
Going Beyond the Intro Model Data Science	2	1521	February 24, 2021
New DataScientist on board - Where do I start? Data Science	3	2035	April 15, 2025
Numerai Tournament Example code using Pytorch NN and Optuna Tournament	13	2710	April 25, 2022
Model Upload Beta! Announcements	9	1635	June 19, 2023
Challenges shared by Richard Data Science	2	997	March 3, 2022

Eejits guide to Numer.ai

Related topics