I am working a little bit on automating it further and into the cloud, especially since holidays and other trips are coming up Anyway, using oracle free tier I managed to predict and upload 20 v3/v4 models based on the smaller feature-set within 10 mins on that free tier (with only 1gb memory) so thatās working nicely.
I also got it working with the medium feature-set (i forgot how many that was, somewhere between 300 and 500?), loading the training set for neutralization purposes ofcourse takes longer but I am guessing 20 models still would go within the hour, not bad for free compute which you can keep on 24/7.
I will write a forum post soon on how I did this with some example code/instructions, maybe others can benefit from it too.
Do you submit from a home machine, compute/cloud or website?
Local machine
Do you want to train your model locally or in the cloud?
Depending on resources needed. Prefer local.
How often do you retrain your models?
Manually, no set schedule.
What cloud platform are you most comfortable with?
AWS (but am open to options)
What are the biggest pain points with the current Compute setup?
Lack of information. More explanation would be helpful, to see if it works for me or not, and whether the cost is worth it. Not even sure what the costs would be.
Some proposals we have: Which of these is most appealing to you? What changes would you make?
None of them look appealing as Iām not so interested in notebooks. Probably need to store an image of environment to run all models in single instance and run it on your trigger. Would only use a notebook service if it was secure and cheap.
Do you submit from a home machine, compute/cloud or website?
Google Cloud
Do you want to train your model locally or in the cloud?
Locally
How often do you retrain your models?
Tournament: Never, Signals: Weekly
What cloud platform are you most comfortable with?
Google Cloud
Do you use version control for your model code?
Yes, git + yaml-files.
What are the biggest pain points with the current Compute setup?
Not enough Memory
How do you typically deploy a model to production?
Usually dockerized but currently I have a bash-script deployed on a VM, which spins up weekly and runs that script via a startup-script. The bash script pulls from github, loads the model from google buckets and runs a script.
Embed numerai-cli and its dependencies (docker, terraform) to vagrant https://www.vagrantup.com/ that will help build a common environment for all numeratis and help them focus on adding value to numerai meta-model
Build a docker image that will contain all common ML framework to build AI model (tensorflow, pytorch, lightgbm, xgboost, catboost) and vscode with notebook, git extension
The main objective is to have all environment setup in vagrant file that will be run in one click after installation of virtualbox, vmware,hyper-v with vagrant.
If someone want to share its work, he could use vagrant share.
I will be happy to work on something like this if this idea interests community.
Its quite interesting to see there are still a lot of people who donāt even consider cloud as an option (at least for weekly predictions). It makes me think that a general solution should be capable of supporting both cloud and local compute out-of-the-box.
Do you submit from a home machine, compute/cloud or website?
Cloud
Do you want to train your model locally or in the cloud?
Cloud
How often do you retrain your models?
Classic: Whenever new data is released for 99% of the models. Experimenting with weekly re-training
Signals: Weekly for most, monthly or quarterly for some. Some are not models but just features and calculated weekly
What cloud platform are you most comfortable with?
Google Cloud Platform
Do you use version control for your model code?
Not really
What are the biggest pain points with the current Compute setup?
Never tried it
How do you typically deploy a model to production?
Test locally then upload it to google cloud storage. The most recent version will get downloaded each week and used
Similar to @jrai, I use GCP and just run everything on a vm. The vm is on a cron job, downloads the latest python files from cloud storage, runs them, submits the predictions and then shuts down automatically. For batch predictions like weāre doing this works very well.
Do you submit from a home machine, compute/cloud or website?
Classic: numerai-compute on AWS. Signals: Dedicated server
Do you want to train your model locally or in the cloud?
Classic: Cloud. Signals: Dedicated server
How often do you retrain your models?
Classic: Never. Signals: Every submission
What cloud platform are you most comfortable with?
GCP
Do you use version control for your model code?
Yes
What are the biggest pain points with the current Compute setup?
Signals: Terraform is too opaque for me to understand how to use it to kick off my complex Signals pipeline, so I just roll my own with schedule. Iād be open to setting up a webhook you could call if timing is variable
How do you typically deploy a model to production?
Signals: Docker-compose up on dedicated server
Bonus questions:
How long does your pipeline take?
Classic: 2-5 min (inference only on legacy data). Signals: >24 hours (data collection of the past week + retraining + inference)
How easy would it be to move to daily submissions?
Classic: Pretty easy. Signals: Pretty hard, would require significant rearchitecting of data pipeline
Do you submit from a home machine, compute/cloud or website?
numerai-compute on AWS
Do you want to train your model locally or in the cloud?
locally
How often do you retrain your models?
Classic: ~5% of slots weekly, some trained years ago still running Signals: Every 3 - 6 months
What cloud platform are you most comfortable with?
AWS
Do you use version control for your model code?
Yes
What are the biggest pain points with the current Compute setup?
Initial setup, diagnosing problems, shutting down/restart via AWS if needed
Lack of AWS creds accessible from env like Numerai creds. (AWS Creds used for S3- bucket read/saves/archiving/ensembling etc.)
How do you typically deploy a model to production?
Numerai node deploy
Bonus questions:
How long does your pipeline take?
Classic: ~20 min (FE then inference only for 50 slots, single slot trigger). Signals: >1 hour (data retrieval + feature engineering + inference)
How easy would it be to move to daily submissions?
Classic: Pretty easy. Signals: Dicey depending upon time allowed after trigger because of data pipeline
Hello Numeratis,
A few months ago, I made you a proposal to set up a dedicated cloud workspace [Proposal] Numerai.cloud - Open source cloud workspace for the community. Despite some skeptics I decided to make this project a reality, so I set up a subscription page for those wishing to access the app in beta and support the project at this link https://numerai-cloud.ghost.io/.
I am convinced that this would make life easier for all of us, reduce friction and allow us to quickly onboard new Numeratis and explore more ideas TC friendly.
What does this code do? What is happening behind the scene?
Will this work with my model?
This works with any model or pipeline that matches the sklearn interface. As long as your model has a predict function it will work.
This is such a big limitation and assumption. Many user models will not work.
What are the limitations of Compute Lite?
Compute Lite uses Lambda to run your deployed model, so there are run time and memory constraints. Lambda has a maximum run-time of 15 minutes and maximum memory allocation of 3GB. If your model inference exceeds these limits, it will not work until we add support for AWS Batch
Yes I have a cronjob on a local raspberry waking up my main computer to run the scripts.
Do you submit from a home machine, compute/cloud or website?
Home machine
Do you want to train your model locally or in the cloud?
Locally
How often do you retrain your models?
Depends on the model, those that I retrain once per month
What cloud platform are you most comfortable with?
None of them
Do you use version control for your model code?
Yes
What are the biggest pain points with the current Compute setup?
Lack of control. I donāt like private code somewhere other than my local machines. I also donāt like the approach āLet me handle everything for youā and would rather have āHere is a ready to use solution, but you can also modify it or do everything on your ownā. Also, some of my models require some compute power, and I already have a local machine capable of doing that. I donāt want to spend extra money for expensive cloud services. If a webhook trigger mechanism becomes mandatory, I would really like to be able to set custom webhook URLs in my Numerai account, and let me do my own thing.
How do you typically deploy a model to production?
For most models I create a custom model file that can be added to a folder of deployed models after I have trained it.
To support this future, we are exploring the idea of daily rounds with much shorter submission windows. This change will effectively make model automation mandatory.
Just define a daily time window where models are supposed to upload their predictions, i.e. every Mon-Fri from 6:00 UTC to 10:00 UTC. A simple cronjob will work just fine.
The key message I want to convey is that I am fine with everything unless it becomes impossible to upload predictions other than by using a Numerai Compute node running in a cloud service.
While I understand the need and advantage for many users, I am worried that the new Numerai Compute will take away the clean, straightforward and above all flexible approach of Numerai tournament (download data ā do whatever you want ā upload the predictions).
With Numerai Compute the user models are run on demand by Numerai. Numerai decides when to call what, which is a big shift from the current standard where is the user who decides to do what and when.
I can understand the Numeraiās need for this paradigm shift, but I do not accept the decrease in flexibility on how I can run my model or what I can do (which is a limit imposed by both the current form of Numerai Compute and the fact we have to use AWS).
if this paradigm shift will become mandatory, please, please, please add the possibility to skip Numerai Compute and allow users to register a Webhook on their account instead. The Webhook would work as a simple trigger that starts user models. That would give us back the flexibility to do anything our models need.