During my time on a team using Docker I’ve had to on-board a number of engineers who are completely unfamiliar with containerisation. While there are lots of guides to myriad the individual components you’ll need to work with Docker, I find developers struggle to understand the relationship and the boundaries between the pieces.
This article will give a short overview of the key parts of the Docker ecosystem, how they fit together and, crucially, why you need each bit. This is intended to help you get a map of the terrain, and allow you to join a Docker team and be productive quickly. However, working with Docker is complicated, and, in order to get the best out of it, I highly encourage you to dig in and learn more about each of the components as you continue your journey.
What is Docker?
Docker is a containerisation technology which allows you to build isolated, reproducible application environments which you can use to develop applications, then push those same environments into production. Containers work similarly to virtual machines, with the key difference being that virtual machines emulate physical hardware, whereas Docker only provides an abstraction over user-space - the result being that Docker containers have a smaller performance overhead than full VM virtualisation (YMMV on non-Linux hosts).
Images, containers and volumes
The key unit of Docker is images. Images are immutable file systems packaged up alongside some run-time configuration, which are built by running a
Dockerfiles contain a mixture of
RUN shell commands, which build up the file system in layers by snapshotting the state after each command, and Docker commands which configure networking, environment variables, default command entry points and some other bits.
It’s fairly uncommon for you to completely write a
Dockerfile from scratch - it’s more likely that you’ll want to use an existing image as a template. For example, if you have a Python web application, you’d probably start from the public, official
python base image:
The Python base images provide you with an OS with a given version of Python installed. You can then extend this image by including any additional libraries you depend on, adding your application source code and then configuring your app’s entry point command.
Having a base starting point isn’t the only reason why public images are useful, they also act as an easy way to install and run software. For example, rather than go through the process of installing and managing Postgres on your development machine, and then installing and managing the same version in production, you can just use the
PostgreSQL Docker image for your chosen version.
There a plenty of public images, the most common source is Docker Hub:
Run time configuration
Once an image is running, it becomes a
container, in the same way as in OO an instantiated
class is an
instance. When you run a Docker image you can add or modify much of the image’s configuration by providing arguments to the
docker run command. For example, if you’re using the
PostgreSQL image, you might want to override the port that it runs on, or the
ENV variable which the image uses to set the DB password.
Storing container data with bind-mounts and volumes
As I mentioned to above, an image is a file system built from immutable (i.e. read only) layers. However, running containers often need somewhere to write data to.
One way to have a writeable filesystem inside a container is to mount a directory from your host machine inside the container at a given mount point, in much the same way as you might mount a network attached device inside a unix filesystem. This is called a bind-mount in docker parlance.
One common use-case for bind mounts is in development to mount the application source code that you’re working on. This allows you to edit your application code on your host machine and immediately have the changes synced into the container without requiring you to rebuild the image.
Sometimes you want a persistent data volume for your container, but you don’t really care where that data lives on the host. For example, if you have a running PostgreSQL container, you want Postgres to be able to write its database data somewhere, and you probably want this to persist across multiple container runs. You could achieve this using a bind-mount by putting aside a directory on your host and storing the data there. However, you don’t really care about having the content visible on your host machine and this means managing this directory yourself.
For this use-case, Docker has
Volumes are persistent directories mounted in much the same way as bind-mounts, but Docker takes care of the creation, location, and cleanup of the directory on the host for you.
Docker vs. docker-compose
In the course of developing an application, it is likely that you’ll be running and coordinating multiple containers at the same time. For example, in a typical Python web application, you probably want at least the following running:
- Your custom built web application container, likely a custom image based on a public image like
- A database container, e.g.
- A cache storage provider, e.g.
Furthermore, as you’ve now seen, there are lots of options you can provide to run a Docker container. Managing all these by hand, and sharing the knowledge across your development team is pretty obviously unsustainable. That’s where
docker-compose comes in.
docker-compose allows you to write a YAML manifest file (by default called
docker-compose.yml) which describes each of the “services” (i.e. image + running config) your application consists of. You then interact with
docker-compose which reads the manifest and runs the docker commands on your behalf.
For example, you could write a manifest file which described our Python + Postgres + Redis application, then simply run
docker-compose up in the same directory as your manifest to fetch, build and run all the containers required for the application.
That about covers the main concepts you’ll need to know coming onto a team using Docker for development. Here are some common commands:
# Build or pull all images required for services in # ./docker-compose.yml # https://docs.docker.com/compose/reference/build/ docker-compose build # Run a specific service in ./docker-compose.yml # https://docs.docker.com/compose/reference/run/ docker-compose run <service-name> # Run the given shell command inside the specified service # (rather than the image’s default run command) docker-compose run <service-name> <shell command> # bring up everything in ./docker.compose.yml # equivalent to docker-compose build && docker-compose run # https://docs.docker.com/compose/reference/up/ docker-compose up # check running status of services in ./docker.compose.yml docker-compose ps # stop all services in ./docker.compose.yml docker-compose stop # stop all services in ./docker.compose.yml, # then delete all your local containers, volumes and networks. # This is the nuclear option - `stop` is more commonly used docker-compose down # same as `run`, but connects to a running container, # rather than spawning a new one docker-compose exec <service-name> <shell command> # Docker only allocates a fixed amount of space for images and # containers, which in the course of development will likely # fill up. If you’re seeing ‘no disk space’ errors, # use this to clean up # https://docs.docker.com/engine/reference/commandline/system_prune/ docker system prune