After the first shock caused by the size of the new dataset I started looking for solutions.
My most successful models are Random Forest based models, which were trained on a 6 core CPU. The new dataset makes this approach impossible.
Luckly I found cuML, which is an ML libary which implements algorithms with GPU support.
Now I can train on GPU.
6 core CPU vs RTX3090 ~ 100x speed improvement. I haven’t measured it, but it’s in that ballpark.
Thanks for sharing! How do you find the installation process? I remember having a look but I think the it strictly required to use their provided docker image, which put me off. maybe I should have a look.