Rnumerai - R Interface to the Numerai Machine Learning Tournament API
Hi everyone! This post is to introduce Rnumerai, an R Interface for the Numerai Machine Learning Tournament API. After you’ve spent your time pulling every ounce of performance out of your model, this package steps in to help automate the submission process. Let’s walk through some of the functionality in the Readme.
- For the latest stable release:
- For the latest development release:
Automatic submission using this package
1. Load the package.
2. Set working directory where data will be downloaded and submission files would be kept.
Use current working directory
data_dir <- getwd()
Or use temporary directory
data_dir <- tempdir()
Or use a user specific directory
data_dir <- "~/OAG/numerai"
3. Set Public Key and Secret API key variables.
Get your public key and api key by going to numer.ai and then going to
Custom API Keys section under your
Account Tab. Select appropriate scopes to generate the key or select all scopes to use the full functionality of this package.
Optional: If we choose not to setup the credentials here the terminal will interactively prompt us to type the values when we make an API call.
4. Download data set for the current round and split it into training data and tournament data
data <- download_data(data_dir)
data_train <- data$data_train
data_tournament <- data$data_tournament
5. Generate predictions
A user can put his/her own custom model code to generate the predictions here. For demonstration purposes, we will generate random predictions.
submission <- data.frame(id=data_tournament$id,prediction_kazutsugi = sample(seq(.35,.65,by=.1),nrow(data_tournament),replace=TRUE))
6. Submit predictions for tournament and get submission id
The submission object should have two columns (id & prediction_kazutsugi) only.
submission_id <- submit_predictions(submission,data_dir,tournament="Kazutsugi")
7. Check the status of the submission (Wait for a few seconds to get the submission evaluated)
Sys.sleep(10) ## 10 Seconds wait period
8. Stake submission on submission and get transaction hash for it.
stake_tx_hash <- stake_nmr(value = 1)
9. Release Stake and get transaction hash for it.
release_tx_hash <- release_nmr(value = 1)
In our newest release, we’ve added functionality to graphically assess the performance of a single model or collection of models across time. The current performance metrics supported are:
These metrics are used within three main functions:
performance_distribution(username, metric, merge = FALSE, round_aggregate = TRUE)
performance_over_time(username, metric, merge = FALSE, outlier_cutoff = if (round_aggregate) 0 else 0.0125, round_aggregate = TRUE)
summary_statistics(username, dates = NULL, round_aggregate = TRUE)
Here are some samples:
1. Display performance distributions
performance_distribution(c("objectscience", "uuazed","arbitrage"), "MMC")
2. Display performance over time
performance_over_time(c(“objectscience”, “uuazed”,“arbitrage”), “Correlation_With_MM”)
3. Display performance summary statistics
4. Support for many models and accounts
performance_distribution(c("objectscience", "uuazed","arbitrage","cryptoquant","phorex","wander","madmin","wilfred","beepboopbeep","no_formal_agreement"), "Round_Correlation")
As the performance metrics and functionality change, stay tuned for the latest updates! We’ve already included the recent changes in support of multiple models under the same account. If you have any questions or suggestions, please feel free to open a Github issue or here in the forum.