After the first shock caused by the size of the new dataset I started looking for solutions.
My most successful models are Random Forest based models, which were trained on a 6 core CPU. The new dataset makes this approach impossible.
Luckly I found cuML, which is an ML libary which implements algorithms with GPU support.
Now I can train on GPU.
6 core CPU vs RTX3090 ~ 100x speed improvement. I haven’t measured it, but it’s in that ballpark.