Development Environments - RapidsAI, cuML, Docker

krm · March 12, 2021, 1:38am

I’ve been asked a few times in RocketChat to go over how to leverage RapidsAI and Docker for both local or Compute development.

The best place to start is with Docker.
From Docker themselves:

Docker takes away repetitive, mundane configuration tasks and is used throughout the development lifecycle for fast, easy and portable application development - desktop and cloud. Docker’s comprehensive end to end platform includes UIs, CLIs, APIs and security that are engineered to work together across the entire application delivery lifecycle.

What is RapidsAI?

The tl;dr is that RapidsAI is a pre-built set of GPU ready docker containers and libraries (including everyone’s favorite XGBoost).
You can run the Rapids environment locally via conda as well, but when in rome…

What cool stuff do I get?
There are a bunch of cool GPU accelerated packages that come ready to rip, the one that first got me interested was Welcome to cuML’s documentation! — cuml 25.02.00 documentation
cuML is insanely fast for most of the big computation that I’ve come across.
It does take some casting between numpy, pandas, cudf, and cupy but we are using Python, so casting is part of life right?
https://docs.rapids.ai/overview/latest.pdf
This is a really good overview of all the GPU accelerated features that cuML opens up.
Something else that is super snazzy is the pre-configured Dask Scheduler, even for my single GPU setup I can run a lot of stuff in parallel.

So what if I want to try it out and feel the speed for myself?
Read below:

Pre-Reqs:

Linux
NVidia GPU

First install Docker

Then install the NVidia GPU CUDA Drivers

go through:

Pre-Installation
Package Manager Installation
Driver Installation

Next download and run the basic RapidsAI Docker Container

docker pull rapidsai/rapidsai:0.18-cuda11.0-runtime-ubuntu20.04-py3.8
docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 \
    rapidsai/rapidsai:0.18-cuda11.0-runtime-ubuntu20.04-py3.8

make sure to select the right CUDA version, OS version
I use python 3.8 here for the walrus operator :=

Once the container boots, it will display a URL to open for the Jupyter labs instance

If you like the environment and want to make it a bit more reusable, here are a few steps to consider:

Make a shell script that runs the docker command
Get rid of the --rm
Add some volumes for your personal files/notebooks
Persist the Jupyter settings
More?

This is an older example of my start-rapids.sh script which shows off a bunch of the optional flags:
#!/bin/sh

sudo docker run \
  --gpus all \
  -it \
  -p 8888:8888 \
  -p 8787:8787 \
  -p 8786:8786 \
  -e EXTRA_APT_PACKAGES="build-essential" \
  -e EXTRA_PIP_PACKAGES="numerapi numpy" \
  -e EXTRA_CONDA_PACKAGES="joblib scikit-learn torch" \
  -v $(pwd):/rapids/workspace \
  rapidsai/rapidsai:cuda11.0-runtime-ubuntu20.04-py3.8

Happy to talk more about local / cloud development environments with people. I’ve been working with AWS for many years professionally, and love helping myself/others optimize their workflows through tooling

Topic		Replies	Views
Speedup training Random Forests with GPU Data Science	5	1332	September 9, 2021
Kaggle environment on your local machine Data Science	2	5214	March 29, 2021
Hosted Development Environment Data Science	6	885	March 20, 2021
LightGBM with vast.ai cloud GPU Data Science	1	1321	November 11, 2021
Hardware recommendations Data Science	1	1290	July 26, 2020

Development Environments - RapidsAI, cuML, Docker

Related topics