Speedup training Random Forests with GPU

nyuton · September 9, 2021, 9:39am

Hi,

After the first shock caused by the size of the new dataset I started looking for solutions.
My most successful models are Random Forest based models, which were trained on a 6 core CPU. The new dataset makes this approach impossible.

Luckly I found cuML, which is an ML libary which implements algorithms with GPU support.
Now I can train on GPU.

6 core CPU vs RTX3090 ~ 100x speed improvement. I haven’t measured it, but it’s in that ballpark.

Enjoy!

yxbot · September 9, 2021, 2:52pm

Thanks for sharing! How do you find the installation process? I remember having a look but I think the it strictly required to use their provided docker image, which put me off. maybe I should have a look.

A very good alternative would be XGB-GPU powered RF:
https://xgboost.readthedocs.io/en/latest/tutorials/rf.html

nyuton · September 9, 2021, 3:29pm

You can install it with Conda:

Also works fine on Windows with WSL

yxbot · September 9, 2021, 3:30pm

nice, will check - thanks

nyuton · September 9, 2021, 3:30pm

By the way rapids.ai can do a lot more algorithms on GPU.

yxbot · September 9, 2021, 3:31pm

yeah, I know, lots of their DS team are kaggler friends - they have been reworking the whole sklearn suits

Topic		Replies	Views
Development Environments - RapidsAI, cuML, Docker Data Science	0	1338	March 12, 2021
Hardware recommendations Data Science	1	1290	July 26, 2020
LightGBM with vast.ai cloud GPU Data Science	1	1321	November 11, 2021
How to train on the full V4 dataset with 8GB RAM Data Science	5	1507	October 6, 2022
What is your machine spec(CPU and GPU) for NUMERAI? Tournament	28	2039	October 28, 2022

Speedup training Random Forests with GPU

Related topics