Kaggle environment on your local machine

kainsama · April 22, 2020, 7:34am

On rocket chat, I have seen people talk about how they have been struggling to get models like XGBoost running on GPU so I decided to share with you instructions on how to set up Kaggle’s environment for Python on your own computer (cloud or local instances). It comes with almost all the libraries you need for data science tasks (for example in this image you have access to XGBoost and CatBoost on both GPU and CPU by default) and it is maintained regularly.

Requirements:

I’m assuming that your using Ubuntu as your OS, not Windows or something else with an Nvidia GPU card (but of course it would be nice if someone shares a tutorial for Windows). You have
Docker is installed on your computer

Running the container:

clone (or download) https://github.com/Kaggle/docker-python
change directory to the folder docker-python
build a docker image with the latest updates ./build --gpu
note: this step would take a while to finish and you need to have space for docker to download like 50GB+ of images
run the image using the following script:
docker run -d --name=kagglecontainer --restart=always -v $(pwd):/home/ml/Kain --env LD_LIBRARY_PATH=/usr/local/cuda/lib64 --runtime=nvidia -p 9999:8888 -it kaggle/python-gpu-build jupyter notebook --no-browser --ip="0.0.0.0" --NotebookApp.token='' --NotebookApp.password='' --allow-root
note: here you can replace 9999 with the port you want to use, replace /home/ml/Kain with your own working directory for data science tasks
go to localhost:9999 (or whatever port to you replaced 9999 with) in your browser to have access to the environment

Atter this, the only thing we need to do is build a docker image from time to time and run it as a container. With this docker container, we can have the same functionalities avaiable in Kaggle environments on our local machines (without run time or hard disk restrictions).

surajp · May 5, 2020, 5:08am

Colab works well for me

sirmobius · March 29, 2021, 6:40pm

Thanks, this is great.

I would be very interested if anyone could share how to do this on a windows 10 machine - without using WSL2.

Topic		Replies	Views
Hosted Development Environment Data Science	6	886	March 20, 2021
Where do you train your models (if not Google Colab) Tournament	11	1857	September 19, 2024
LightGBM with vast.ai cloud GPU Data Science	1	1324	November 11, 2021
Development Environments - RapidsAI, cuML, Docker Data Science	0	1341	March 12, 2021
Hardware recommendations Data Science	1	1290	July 26, 2020

Kaggle environment on your local machine

Related topics