Where do you train your models (if not Google Colab)

nathanganser · March 31, 2024, 7:22pm

As I’m starting to work with the larger datasets, my Google Colab functions are taking hours to run and thus disconnecting me in between when I leave my laptop.

Is there another tool out there that can run a jupyter notebook in the background? What are you using?

Thanks!

svendaj · April 5, 2024, 2:56pm

I am using and recommend Kaggle. Not only that you can run for free jupyter notebooks with 4 CPUs and 30GB RAM (max 12 hours run each, 5 notebooks in parallel), but thanks to Kaggle API you can fully automatize your pipeline. On top, there is great community and possibility to learn more from competitions.

I have created a few public notebooks and datasets to simplify first steps for Numerai Tournament participants:

weekly updated dataset with latest data, so that you do not need to download them each time:
- V4.3 Midnight - latest data, notebook producing the dataset: numerai data v4.3 Midnight (kaggle.com)
- V4.2 Rain, notebook producing the dataset: numerai data v4.2 Rain (kaggle.com)
- Older V4 and V4.1 and notebook producing the dataset: numerai data (kaggle.com)
Example models - typically forks of Numerai example models with improvements for better results:
- Hello Numerai automated (kaggle.com), with upload via NumerAPI, model is also staked so you can check its performance on leaderboard ( JOS_KAGGLE_HELLO - Numerai and improved version JOS_KAGGLE_SHATTER - Numerai)
- numerai Feature Neutralization (kaggle.com) - example model with feature neutralization, JOS_KAGGLE_MEDIUM_FN - Numerai
- numerai Target Ensemble (kaggle.com) - example model of target ensembling, JOS_KAGGLE_MEDIUM_TE - Numerai
- Numerai Example Model Sunshine (kaggle.com) - older more complex example model, JOS_KAGGLE_SUNSHINE - Numerai
Kaggle automation tips:

nathanganser · April 5, 2024, 5:24pm

Thank you! I’ll review this!!!

dfrank · May 3, 2024, 7:10am

Thank you! Also a new user exploring ways to train models (turns out my ~10 year old gaming rig can’t handle much past the “small” feature set)

ambrul11 · May 7, 2024, 1:18pm

Also using kaggle.com. Highly convenient option

smilence666 · May 10, 2024, 4:44pm

I train it on my local computer and use a batch script to submit dailly.

datahunter · August 26, 2024, 6:51pm

This is very useful, thank you so much

nathanganser · August 26, 2024, 7:04pm

Update on my end:

I’ve tested many options (including Kaggle, but 30GM of RAM is limiting)
I now exclusively use https://brev.dev/ which has super cheap GPUs and using screen you can easily leave your process running in the background for days and and get the model once the process is done.

Highly recommend that!

svendaj · August 27, 2024, 7:10pm

Some guys are using Rent GPUs | Vast.ai as most price effective GPU option.

gammarat · August 28, 2024, 1:59pm

I found it useful to have my programs periodically save relevant parameters to files. For example with my basic Numerai program I save parameters every 250 iterations to files named by the iteration number (like “XXXXX.mat”, where XXXXX is the iteration number, and “mat” because I use MatLab). In my case 250 iterations represents about 1/2 hour of processing, so there’s not much lost if for some reason or another processing is interrupted.

That also allows you to branch off downstream programs from pretty much where one likes.

FWIW I train my models at home, and the few times (for other projects) I’ve used Colab I used Google Drive for data storage.

gregbowers · September 17, 2024, 8:29am

This is very useful, thank you so much

shi_luo · September 19, 2024, 8:21pm

I’m experimenting with PaperSpace Gradient. It’s like Colab but with ‘unlimited’ GPUs with monthly subscription. I’m currently trying to build the correct virtual environments on Gradient so the pickled model is consistent with Numerai Evaluation. If it do work well, I’m guessing it would be a solid place for high-ram GPU training!

Topic		Replies	Views
Introductory Colab Notebook Addressing Common Challenges Tournament	7	2255	June 14, 2022
Automated numerai submission with Notebook Tournament	17	2207	March 25, 2023
Tournament NN baseline with the new massive data Tournament	3	1391	October 19, 2021
Kaggle Numerati: Yet Another Numerai Dashboard Tournament	0	320	June 24, 2024
Advice on setup for training new models Data Science	1	475	March 3, 2024

Where do you train your models (if not Google Colab)

Related topics