On rocket chat, I have seen people talk about how they have been struggling to get models like XGBoost running on GPU so I decided to share with you instructions on how to set up Kaggle’s environment for Python on your own computer (cloud or local instances). It comes with almost all the libraries you need for data science tasks (for example in this image you have access to XGBoost and CatBoost on both GPU and CPU by default) and it is maintained regularly.
- I’m assuming that your using Ubuntu as your OS, not Windows or something else with an Nvidia GPU card (but of course it would be nice if someone shares a tutorial for Windows). You have
- Docker is installed on your computer
Running the container:
- clone (or download)
- change directory to the folder
- build a docker image with the latest updates
note: this step would take a while to finish and you need to have space for docker to download like 50GB+ of images
- run the image using the following script:
docker run -d --name=kagglecontainer --restart=always -v $(pwd):/home/ml/Kain --env LD_LIBRARY_PATH=/usr/local/cuda/lib64 --runtime=nvidia -p 9999:8888 -it kaggle/python-gpu-build jupyter notebook --no-browser --ip="0.0.0.0" --NotebookApp.token='' --NotebookApp.password='' --allow-root
note: here you can replace
9999with the port you want to use, replace
/home/ml/Kainwith your own working directory for data science tasks
- go to localhost:9999 (or whatever port to you replaced 9999 with) in your browser to have access to the environment
Atter this, the only thing we need to do is build a docker image from time to time and run it as a container. With this docker container, we can have the same functionalities avaiable in Kaggle environments on our local machines (without run time or hard disk restrictions).