Speedup training Random Forests with GPU

nyuton · September 9, 2021, 9:39am

Hi,

After the first shock caused by the size of the new dataset I started looking for solutions.
My most successful models are Random Forest based models, which were trained on a 6 core CPU. The new dataset makes this approach impossible.

Luckly I found cuML, which is an ML libary which implements algorithms with GPU support.
Now I can train on GPU.

6 core CPU vs RTX3090 ~ 100x speed improvement. I haven’t measured it, but it’s in that ballpark.

Enjoy!

yxbot · September 9, 2021, 2:52pm

Thanks for sharing! How do you find the installation process? I remember having a look but I think the it strictly required to use their provided docker image, which put me off. maybe I should have a look.

A very good alternative would be XGB-GPU powered RF:
https://xgboost.readthedocs.io/en/latest/tutorials/rf.html

nyuton · September 9, 2021, 3:29pm

You can install it with Conda:

Also works fine on Windows with WSL

yxbot · September 9, 2021, 3:30pm

nice, will check - thanks

nyuton · September 9, 2021, 3:30pm

By the way rapids.ai can do a lot more algorithms on GPU.

yxbot · September 9, 2021, 3:31pm

yeah, I know, lots of their DS team are kaggler friends - they have been reworking the whole sklearn suits