Example Models For Current Dataset


Hello everyone! Just joined and started looking into Numer.ai. Had a quick question. Are there sample models that are available that work with the current dataset? Looking to wrap my mind around this. Thanks!


Hi, @mrjaekin

Have you had a look in the zip alongside the dataset?
We’ve added two simple example models in Python and R.

Also have a look at some of the guide and posts users have made:

Also feel free to ask questions in our Slack: https://slack.numer.ai


I’ll have a look. Thanks @jonathan


Here is a simple Xgboost model using the supplied code from the zip. Clearly there is nothing going on here with this but it will give you an idea of structure etc.

Edit #1: The original sample threw an error. I corrected that and did a test run.
Edit #2: This was written on Python 3.x, Python 2.7 may require adjustments.

!/usr/bin/env python

Example classifier on Numerai data using a Xgboost linear regression classifier.
To get started, install the required packages: pip install pandas, numpy, sklearn
Xgboost may require manual installation dpending on your environment

Original, elegant code by Xander Dunn
Butchery by ObjectScience

This will get you to 0.69254 with Originality, Concordance 
and 75% Consistency as is.

import pandas as pd
import numpy as np
from sklearn import metrics, preprocessing
from xgboost import XGBRegressor

def main():
    # Set seed for reproducibility

    print("Loading data...")
    # Load the data from the CSV files
    training_data = pd.read_csv('numerai_training_data.csv', header=0)
    prediction_data = pd.read_csv('numerai_tournament_data.csv', header=0)
    # Transform the loaded CSV data into numpy arrays
    features = [f for f in list(training_data) if "feature" in f]
    X = training_data[features]
    Y = training_data["target"]
    x_prediction = prediction_data[features]
    ids = prediction_data["id"]

    # This is your model that will learn to predict
    model = XGBRegressor(n_estimators=100, max_depth=4, learning_rate=0.02, subsample=0.9, colsample_bytree=0.85, objective='reg:linear')

    # Your model is trained on the training_data
    model.fit(X, Y)

    # Your trained model is now used to make predictions on the numerai_tournament_data
    results = model.predict(x_prediction)
    results_df = pd.DataFrame(data={'probability':results})
    joined = pd.DataFrame(ids).join(results_df)

    print("Writing predictions to predictions.csv")
    # Save the predictions out to a CSV file
    joined.to_csv('predictions.csv', index=False)

    # Now you can upload these predictions on numer.ai

if __name__ == '__main__':


Bumping for edit. The code actually runs now and the results will get you in the ballpark.


Thanks @ObjecScience. I have 1 question. I installed all the dependencies, well, I thought I did. But I’m getting the following error in Linux.

Traceback (most recent call last):
File “Numerai.py”, line 18, in
from xgboost import XGBRegressor
ImportError: cannot import name ‘XGBRegressor’

xgboost is already installed using pip. Any suggestions? Thanks!


Are you able to use xgb elsewhere? (I know there have been some difficulties getting it up and running), What version of Linux are you on and what version of Python are you running (I’d like to eliminate environment stuff first).

On windows I had to follow this to get it to work as pip install didn’t work for me…


Being nothing more than a hobbyist in all of this I haven’t moved to Linux as I’m doing this on my “gaming” computer. I could go to a dual-boot or virtual environment but I’m horrendously lazy and lack motivation of any sort, unless it has something to do with drinking coffee… :slight_smile:

If it does end up being a Linux thing there should be tons of info out there to get a proper install. This link has an alternative approach if “pip” fails, try that and let us know. We’ll get it figured out one way or another.



Just got it working… was using Python3 instead of Python2. Thanks for the tips!


Good deal. Edited the original post for clarity on versions… Thanks for posting up.


I’m by no means a Ubuntu expert… I probably should have started with Windows… HAHA. Thanks for your help @ObjecScience. This will get me going for now. I’ll do some more research on it and see where I can get. Thanks!


You’re welcome. I’m going to be expanding on this in the next day or two so keep an eye out. It’s a simple framework that will allow you to run and ensemble a few models together. It’s bits and pieces of stuff made by people a lot smarter than me. It will probably need some tuning as far as the coding goes, but it should generate ideas.