Help us improve Numerai Compute!

I am working a little bit on automating it further and into the cloud, especially since holidays and other trips are coming up :slight_smile: Anyway, using oracle free tier I managed to predict and upload 20 v3/v4 models based on the smaller feature-set within 10 mins on that free tier (with only 1gb memory) so thatā€™s working nicely.

I also got it working with the medium feature-set (i forgot how many that was, somewhere between 300 and 500?), loading the training set for neutralization purposes ofcourse takes longer but I am guessing 20 models still would go within the hour, not bad for free compute which you can keep on 24/7.

I will write a forum post soon on how I did this with some example code/instructions, maybe others can benefit from it too.

Are your submissions automated now?

Semi automated (manually triggered)

Do you submit from a home machine, compute/cloud or website?

Local machine

Do you want to train your model locally or in the cloud?

Depending on resources needed. Prefer local.

How often do you retrain your models?

Manually, no set schedule.

What cloud platform are you most comfortable with?

AWS (but am open to options)

What are the biggest pain points with the current Compute setup?

Lack of information. More explanation would be helpful, to see if it works for me or not, and whether the cost is worth it. Not even sure what the costs would be.

Some proposals we have:  Which of these is most appealing to you? What changes would you make?

None of them look appealing as Iā€™m not so interested in notebooks. Probably need to store an image of environment to run all models in single instance and run it on your trigger. Would only use a notebook service if it was secure and cheap.

Are your submissions automated now?

Partially, manually triggered each week

Do you submit from a home machine, compute/cloud or website?

Home machine

Do you want to train your model locally or in the cloud?

Locally because of extensive GPU usage and retrainings and model iterations

How often do you retrain your models?

Each week for Signals, never for Classic

What cloud platform are you most comfortable with?

Google Cloud

Do you use version control for your model code?

Yes

What are the biggest pain points with the current Compute setup?

I need to swap models each week to adjust stake weightings, thatā€™s my main block for complete automation.

How do you typically deploy a model to production?

Jupyter notebooks or model dump

  • Are your submissions automated now?
    Yes
  • Do you submit from a home machine, compute/cloud or website?
    Google Cloud
  • Do you want to train your model locally or in the cloud?
    Locally
  • How often do you retrain your models?
    Tournament: Never, Signals: Weekly
  • What cloud platform are you most comfortable with?
    Google Cloud
  • Do you use version control for your model code?
    Yes, git + yaml-files.
  • What are the biggest pain points with the current Compute setup?
    Not enough Memory
  • How do you typically deploy a model to production?
    Usually dockerized but currently I have a bash-script deployed on a VM, which spins up weekly and runs that script via a startup-script. The bash script pulls from github, loads the model from google buckets and runs a script.
  • Are your submissions automated now? No
  • Do you submit from a home machine, compute/cloud or website? Google colab
  • Do you want to train your model locally or in the cloud? both
  • How often do you retrain your models? Everytime that i have new idea
  • What cloud platform are you most comfortable with? AWS
  • Do you use version control for your model code? Yes
  • What are the biggest pain points with the current Compute setup? Many configurations to do to deploy and just setup webhook
  • How do you typically deploy a model to production? I just import the model in my google colab notebook

My suggestions are :

The main objective is to have all environment setup in vagrant file that will be run in one click after installation of virtualbox, vmware,hyper-v with vagrant.
If someone want to share its work, he could use vagrant share.

I will be happy to work on something like this if this idea interests community.

Other option will be https://cml.dev/ which is a git friendly approach that use github actions to trigger cloud computing

  • Are your submissions automated now?

Sort of. I need to start them manually but I could do that from a phone while travelling.

  • Do you submit from a home machine, compute/cloud or website?

Home machine, unless Iā€™m travelling.

  • Do you want to train your model locally or in the cloud?

Locally.

  • How often do you retrain your models?

I donā€™t. However, Iā€™m moving to v4 now and I really need to come to terms with TC.

  • What cloud platform are you most comfortable with?

Google Cloud.

  • Do you use version control for your model code?

Yes, just CVS.

  • What are the biggest pain points with the current Compute setup?

I donā€™t find the time to look at it. :wink:

  • How do you typically deploy a model to production?

Manually.

  1. Are your submissions automated now?
  • Semi automated
  1. Do you submit from a home machine, compute/cloud or website?
  • Home machine, donā€™t wish to use anything else.
  1. Do you want to train your model locally or in the cloud?
  • Locally.
  1. How often do you retrain your models?
  • Once a month.
  1. What cloud platform are you most comfortable with?
  • None.
  1. Do you use version control for your model code?
  • Yes.
  1. What are the biggest pain points with the current Compute setup?
  • I would like to try it sometime, but I am finding myself too lazy to do that.
  1. How do you typically deploy a model to production?
  • By changing a model-to-id mapping.

Its quite interesting to see there are still a lot of people who donā€™t even consider cloud as an option (at least for weekly predictions). It makes me think that a general solution should be capable of supporting both cloud and local compute out-of-the-box.

3 Likes
  • Are your submissions automated now?
    Semi-automated
  • Do you submit from a home machine, compute/cloud or website?
    Google Colab
  • Do you want to train your model locally or in the cloud?
    No preference.
  • How often do you retrain your models?
    Every week.
  • What cloud platform are you most comfortable with?
    No preference.
  • Do you use version control for your model code?
    No.
  • What are the biggest pain points with the current Compute setup?
    Havenā€™t tried Compute
  • How do you typically deploy a model to production?
    Manually.

Whatever can easily/cheaply store a model and trigger it is what Iā€™m for. If Compute already does that, then cool.

  • Are your submissions automated now?
    Yes

  • Do you submit from a home machine, compute/cloud or website?
    Cloud

  • Do you want to train your model locally or in the cloud?
    Cloud

  • How often do you retrain your models?
    Classic: Whenever new data is released for 99% of the models. Experimenting with weekly re-training
    Signals: Weekly for most, monthly or quarterly for some. Some are not models but just features and calculated weekly

  • What cloud platform are you most comfortable with?
    Google Cloud Platform

  • Do you use version control for your model code?
    Not really

  • What are the biggest pain points with the current Compute setup?
    Never tried it

  • How do you typically deploy a model to production?
    Test locally then upload it to google cloud storage. The most recent version will get downloaded each week and used

Similar to @jrai, I use GCP and just run everything on a vm. The vm is on a cron job, downloads the latest python files from cloud storage, runs them, submits the predictions and then shuts down automatically. For batch predictions like weā€™re doing this works very well.

  • Are your submissions automated now?
    Yes
  • Do you submit from a home machine, compute/cloud or website?
    Classic: numerai-compute on AWS. Signals: Dedicated server
  • Do you want to train your model locally or in the cloud?
    Classic: Cloud. Signals: Dedicated server
  • How often do you retrain your models?
    Classic: Never. Signals: Every submission
  • What cloud platform are you most comfortable with?
    GCP
  • Do you use version control for your model code?
    Yes
  • What are the biggest pain points with the current Compute setup?
    Signals: Terraform is too opaque for me to understand how to use it to kick off my complex Signals pipeline, so I just roll my own with schedule. Iā€™d be open to setting up a webhook you could call if timing is variable
  • How do you typically deploy a model to production?
    Signals: Docker-compose up on dedicated server

Bonus questions:

  • How long does your pipeline take?
    Classic: 2-5 min (inference only on legacy data). Signals: >24 hours (data collection of the past week + retraining + inference)
  • How easy would it be to move to daily submissions?
    Classic: Pretty easy. Signals: Pretty hard, would require significant rearchitecting of data pipeline
  • Are your submissions automated now?
    Yes

  • Do you submit from a home machine, compute/cloud or website?
    numerai-compute on AWS

  • Do you want to train your model locally or in the cloud?
    locally

  • How often do you retrain your models?
    Classic: ~5% of slots weekly, some trained years ago still running Signals: Every 3 - 6 months

  • What cloud platform are you most comfortable with?
    AWS

  • Do you use version control for your model code?
    Yes

  • What are the biggest pain points with the current Compute setup?
    Initial setup, diagnosing problems, shutting down/restart via AWS if needed
    Lack of AWS creds accessible from env like Numerai creds. (AWS Creds used for S3- bucket read/saves/archiving/ensembling etc.)

  • How do you typically deploy a model to production?
    Numerai node deploy

Bonus questions:

  • How long does your pipeline take?
    Classic: ~20 min (FE then inference only for 50 slots, single slot trigger). Signals: >1 hour (data retrieval + feature engineering + inference)
  • How easy would it be to move to daily submissions?
    Classic: Pretty easy. Signals: Dicey depending upon time allowed after trigger because of data pipeline
  • Are your submissions automated now?
    

Kind of. I have a single script that submits all of my models that I run manually.

  • Do you submit from a home machine, compute/cloud or website?
    

Home.

  • Do you want to train your model locally or in the cloud?
    

Locally.

  • How often do you retrain your models?
    

Never.

  • What cloud platform are you most comfortable with?
    

AWS.

  • Do you use version control for your model code?
    

Yes.

  • What are the biggest pain points with the current Compute setup?
    

Not enough RAM; a pain to set up 20+ times for each model; donā€™t want to have to use Docker or other similar dependencies.

  • How do you typically deploy a model to production?
    

Create a predict.py script for each model and run these every weekend.

Hello Numeratis,
A few months ago, I made you a proposal to set up a dedicated cloud workspace [Proposal] Numerai.cloud - Open source cloud workspace for the community. Despite some skeptics I decided to make this project a reality, so I set up a subscription page for those wishing to access the app in beta and support the project at this link https://numerai-cloud.ghost.io/.
I am convinced that this would make life easier for all of us, reduce friction and allow us to quickly onboard new Numeratis and explore more ideas TC friendly.

I had a look at the Compute Lite Beta Testing Document.

napi.deploy(model_id, model, napi.feature_sets('small'), 'requirements.txt')

What does this code do? What is happening behind the scene?

Will this work with my model?

This works with any model or pipeline that matches the sklearn interface. As long as your model has a predict function it will work.

This is such a big limitation and assumption. Many user models will not work.

What are the limitations of Compute Lite?

Compute Lite uses Lambda to run your deployed model, so there are run time and memory constraints. Lambda has a maximum run-time of 15 minutes and maximum memory allocation of 3GB. If your model inference exceeds these limits, it will not work until we add support for AWS Batch

Same as above

That is actually well explained in the document

Are your submissions automated now?

Yes I have a cronjob on a local raspberry waking up my main computer to run the scripts.

Do you submit from a home machine, compute/cloud or website?

Home machine

Do you want to train your model locally or in the cloud?

Locally

How often do you retrain your models?

Depends on the model, those that I retrain once per month

What cloud platform are you most comfortable with?

None of them

Do you use version control for your model code?

Yes

What are the biggest pain points with the current Compute setup?

Lack of control. I donā€™t like private code somewhere other than my local machines. I also donā€™t like the approach ā€œLet me handle everything for youā€ and would rather have ā€œHere is a ready to use solution, but you can also modify it or do everything on your ownā€. Also, some of my models require some compute power, and I already have a local machine capable of doing that. I donā€™t want to spend extra money for expensive cloud services. If a webhook trigger mechanism becomes mandatory, I would really like to be able to set custom webhook URLs in my Numerai account, and let me do my own thing.

How do you typically deploy a model to production?

For most models I create a custom model file that can be added to a folder of deployed models after I have trained it.

To support this future, we are exploring the idea of daily rounds with much shorter submission windows. This change will effectively make model automation mandatory.

Just define a daily time window where models are supposed to upload their predictions, i.e. every Mon-Fri from 6:00 UTC to 10:00 UTC. A simple cronjob will work just fine.

The key message I want to convey is that I am fine with everything unless it becomes impossible to upload predictions other than by using a Numerai Compute node running in a cloud service.

1 Like
  • Are your submissions automated now?

Semi automated. All models can be sumitted by manually triggering a single script.

  • Do you submit from a home machine, compute/cloud or website?

home machine

  • Do you want to train your model locally or in the cloud?

Locally

  • How often do you retrain your models?

Not too often. Mostly they are trained once and get included into the pipeline without retraining

  • What cloud platform are you most comfortable with?

AWS and GCP

  • Do you use version control for your model code?

No

  • What are the biggest pain points with the current Compute setup?

I often add new models and remove bad models. Change the script, what goes into the ensemble.

Model files are big. Building a container and uploading the whole packet into the cloud takes looong.

1 Like

While I understand the need and advantage for many users, I am worried that the new Numerai Compute will take away the clean, straightforward and above all flexible approach of Numerai tournament (download data ā†’ do whatever you want ā†’ upload the predictions).

With Numerai Compute the user models are run on demand by Numerai. Numerai decides when to call what, which is a big shift from the current standard where is the user who decides to do what and when.

I can understand the Numeraiā€™s need for this paradigm shift, but I do not accept the decrease in flexibility on how I can run my model or what I can do (which is a limit imposed by both the current form of Numerai Compute and the fact we have to use AWS).

if this paradigm shift will become mandatory, please, please, please add the possibility to skip Numerai Compute and allow users to register a Webhook on their account instead. The Webhook would work as a simple trigger that starts user models. That would give us back the flexibility to do anything our models need.

2 Likes