Speedup training Random Forests with GPU


After the first shock caused by the size of the new dataset I started looking for solutions.
My most successful models are Random Forest based models, which were trained on a 6 core CPU. The new dataset makes this approach impossible.

Luckly I found cuML, which is an ML libary which implements algorithms with GPU support.
Now I can train on GPU.

6 core CPU vs RTX3090 ~ 100x speed improvement. I haven’t measured it, but it’s in that ballpark.



Thanks for sharing! How do you find the installation process? I remember having a look but I think the it strictly required to use their provided docker image, which put me off. maybe I should have a look.

A very good alternative would be XGB-GPU powered RF:

1 Like

You can install it with Conda:

Also works fine on Windows with WSL

1 Like

nice, will check - thanks

By the way rapids.ai can do a lot more algorithms on GPU.

1 Like

yeah, I know, lots of their DS team are kaggler friends - they have been reworking the whole sklearn suits