As I’m starting to work with the larger datasets, my Google Colab functions are taking hours to run and thus disconnecting me in between when I leave my laptop.
Is there another tool out there that can run a jupyter notebook in the background? What are you using?
I’ve tested many options (including Kaggle, but 30GM of RAM is limiting)
I now exclusively use https://brev.dev/ which has super cheap GPUs and using screen you can easily leave your process running in the background for days and and get the model once the process is done.
I found it useful to have my programs periodically save relevant parameters to files. For example with my basic Numerai program I save parameters every 250 iterations to files named by the iteration number (like “XXXXX.mat”, where XXXXX is the iteration number, and “mat” because I use MatLab). In my case 250 iterations represents about 1/2 hour of processing, so there’s not much lost if for some reason or another processing is interrupted.
That also allows you to branch off downstream programs from pretty much where one likes.
FWIW I train my models at home, and the few times (for other projects) I’ve used Colab I used Google Drive for data storage.
I’m experimenting with PaperSpace Gradient. It’s like Colab but with ‘unlimited’ GPUs with monthly subscription. I’m currently trying to build the correct virtual environments on Gradient so the pickled model is consistent with Numerai Evaluation. If it do work well, I’m guessing it would be a solid place for high-ram GPU training!