This is a blog post that records the steps to set environment before I successfully trained and run a deep learning model in a docker container. Docker offers an isolated environment for software and dependencies and ensure the application runs quickly and reliably from one computing environment to another, by using container technology. See the difference between docker container and virtual machine in https://www.docker.com/resources/what-container/. Have docker installed on the host computer, build docker environment following steps below.
Create a container and set up environment Link to heading
Open a terminal on host computer and pull a docker image e.g. ubuntu 18.04.
Create & enter a container Link to heading
$ docker pull ubuntu:18.04
$ docker images # list all the images
$ docker run -it --shm-size 6G --gpus all --name [container name] -e NVIDIA_DRIVER_CAPABILITIES=compute,utility -e NVIDIA_VISIBLE_DEVICES=all [image id] # create a container with access to nvidia GPU, refer to https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html #setting-up-nvidia-container-toolkit if unable to locate
$ docker cp [file path] [container id]:/home # pass files from the host to container
$ docker exec -it [container id] bash # enter the container
Install base utilities Link to heading
Now you are in a docker container. Time to install software!
$ apt-get update
$ apt-get install -y build-essential
$ apt-get install -y wget
$ apt-get clean
Install miniconda Link to heading
In case that you need run projects in different python version, create a conda environment.
$ cd home
$ mkdir download && cd download
$ wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ bash Miniconda3-latest-Linux-x86_64.sh
$ export PATH=$PATH:/root/anaconda3/bin
Install cuda Link to heading
Install cuda if you use Nvidia card. Choose the right cuda version.
$ cd home/download
$ wget https://developer.download.nvidia.com/compute/cuda/11.3.0/local_installers/cuda_11.3.0_465.19.01_linux.run
$ sh cuda_11.3.0_465.19.01_linux.run # install the dependencies if needed, and choose to install cuda-11.3 only is enough (do not install GPU driver)
Link libs Link to heading
Export paths to your current environment. Here I use vim to edit ~/.bashrc file, add three lines, save changes and exit vim editor.
$ apt-get install vim
$ vi ~/.bashrc # add the following 3 lines to .bashrc
$ export LD_LABRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.3/lib64
$ export PATH=$PATH:/usr/local/cuda-11.3/bin
$ export CUDA_HOME=$CUDA_HOME:/usr/local/cuda-11.3
Don’t forget to source ~/.bashrc and the check the cuda installation.
$ source ~/.bashrc
$ nvcc -V #check the cuda version
Now you can further set the container according to your project requirements.
Trouble shooting Link to heading
Docker images and containers are saved in /var/lib/docker by default, usually the space is not enough.
One way to change the files’ location: Link to heading
Check the size of each image and container
$ docker system df -v
Stop docker service and copy docker files to the new directory (e.g. /home/docker/lib/)
$ systemctl stop docker
$ rsync -r -avz /var/lib/docker /home/docker/lib/
$ mv /var/lib/docker /var/lib/docker-old
$ ln -s /home/docker/lib/docker /var/lib/
$ systemctl start docker # check images and containers, delete /var/lib/docker-old if everything is OK
Runtime error: dataloader’s workers are out of shared memory. Link to heading
$ df -h | grep shm # check the shared memory
$ service docker stop
$ cd /var/lib/docker/containers/[container id]
$ vim hostconfig.json # set shm-size to a bigger number
$ service docker start
Not found containers Link to heading
$ #try to link the directories again
$ ln -s /home/docker/lib/docker /var/lib/
$ sudo rm -rf /var/lib/docker
Summary Link to heading
This is the basic pipeline to makes things happen in docker container. Find more advanced operations such as dockerfile usage, root permission settings on your own if needed. Things should be easier since you have the initial momentum. Cheers!