Since Bitnami published its first Docker container in 2015, the techniques for writing Dockerfiles have significantly evolved. As part of the team which maintains a container catalog with more than 130 apps, I have worked on adapting the containers and their Dockerfiles to meet the community requirements.
In this tutorial, I will go over these lessons learned, describing some of the best practices and common pitfalls that you are likely to encounter when developing Dockerfiles, by applying them on practical examples. First, I will briefly explain some basic concepts that you need to refresh before examining specific cases. Then, I will walk you through some practical examples, to improve the build time, the size, and the security of a Docker image. To do that, I have provided you with a GitHub repository that contains all the files you need, to follow the tips and tricks shown in this post.
This guide assumes you are familiar with Docker and its build environment. Let’s review some of the basic concepts before you start to put them into practice.
A Docker image is a template that allows you to instantiate running containers. It is represented as a list of instructions (known as layers) in a filesystem.
A Dockerfile is just a blueprint that contains the instructions to build a Docker image. Currently, more than a million Dockerfiles are on GitHub.
The process of building a Docker image from a Dockerfile is known as a Docker build.
For more information, see Dockerfile reference.
Each layer in a Docker context represents an instruction included in a Docker image’s Dockerfile. The layers can also be referred to as “build steps”.
Every time you build a Docker image, each build step is cached. Reuse cached layers that do not change in the image rebuild process to improve the build time.
These are the main areas of improvement that are covered in this guide:
There are two tools that will help you develop your Dockerfiles. Before starting the tutorial, I advise you to:
Buildkit is a toolkit which is part of the Moby project that improves performance when building Docker images. It can be enabled in two different ways:
Exporting the DOCKER_BUILDKIT
environment variable:
$ export DOCKER_BUILDKIT=1
Note Add this instruction to your ~/.bashrc file
Or Configuring the Docker Daemon to add the Buildkit feature:
{ “features”: { “buildkit”: true } }
A Linter helps you detect syntax errors on your Dockerfiles and gives you some suggestions based on common practices.
There are plugins that provide these functionalities for almost every Integrated Development Environment (IDE). Here are some suggestions:
In order to help you follow the examples below, here is a GitHub repository which contains all the files you need during each step of the tutorial.
The examples are based on building a very simple Node.js application’s Docker image using the files below:
The Dockerfile is pretty simple:
FROM debian
# Copy application files
COPY . /app
# Install required system packages
RUN apt-get update
RUN apt-get -y install imagemagick curl software-properties-common gnupg vim ssh
RUN curl -sL https://deb.nodesource.com/setup_10.x | bash -
RUN apt-get -y install nodejs
# Install NPM dependencies
RUN npm install --prefix /app
EXPOSE 80
CMD ["npm", "start", "--prefix", "app"]
This is how the lines above can be read:
Using debian
as the base image, it installs nodejs and npm in the system using the apt-get
command. To run the application, it’s necessary to install some extra system packages for the Node.js setup script to work, such as curl, imagemagick, software-properties-common, or gnupg. Furthermore, it installs vim and ssh packages for debugging purposes.
Once the image has all that it needs to build the application, it installs the application dependencies and uses the npm start
command to start the application. The port 80
is exposed since the application uses it and it is specified with the expose parameter. To build the Docker image for this application, use the command below:
$ docker build . -t express-image:0.0.1
NoteYou can specify the image tag using the format:
IMAGE_NAME:TAG
.
It takes 127.8 seconds to build the image and it is 554MB. We can improve the results by following some good practices.
The build cache is based on the previous steps. You should always keep it in mind and reduce the build time by reusing existing layers.
Let’s try to emulate the process of rebuilding your apps’ image to introduce a new change in the code, so you can understand how the cache works. To do so, edit the message used in the console.log
at server.js and rebuild the image using the command below:
$ docker build . -t express-image:0.0.2
It takes 114.8 seconds to build the image.
Using the current approach, you can’t reuse the build cache to avoid installing the system packages if a single bit changes in the application’s code. However, if you switch the order of the layers, you will be able to avoid reinstalling the system packages:
FROM debian
- # Copy application files
- COPY . /app
# Install required system packages
RUN apt-get update
...
RUN apt-get -y install nodejs
+ # Copy application files
+ COPY . /app
# Install NPM dependencies
...
Rebuild the image using the same command, but avoiding the installation of the system packages. This is the result: it takes 5.8 seconds to build and the improvement is significant.
If a single character changed in the README.md file (or in any other file which is in the repository but is not related to the application), it would result in copying the entire directory into the image which will disrupt the cache once more.
You have to be specific about the files you copy to make sure that you are not invalidating the cache with changes that do not affect the application.
...
# Copy application files
- COPY . /app
+ COPY package.json server.js /app
# Install NPM dependencies
...
NoteUse “COPY” instead of “ADD” when possible. Both commands do basically the same thing, but “ADD” is more complex: it has extra features like extracting files or copying them from remote sources. From a security perspective too, using “ADD” increases the risk of malware injection in your image if the remote source you are using is unverified or insecure.
When building containers to run in production, every unused package, or those included for debugging purposes, should be removed.
The current Dockerfile includes the ssh system package. However, you can access your containers using the docker exec
command instead of ssh’ing into them. Apart from that, it also includes vim for debugging purposes, which can be installed when required, instead of packaged by default. Both packages are removable from the image.
In addition, you can configure the package manager to avoid installing packages that you don’t need. To do so, use the --no-install-recommends
flag on your apt-get
calls:
...
RUN apt-get update
- RUN apt-get -y install imagemagick curl software-properties-common gnupg vim ssh
+ RUN apt-get -y install --no-install-recommends imagemagick curl software-properties-common gnupg
RUN curl -sL https://deb.nodesource.com/setup_10.x | bash -
- RUN apt-get -y install nodejs
+ RUN apt-get -y install --no-install-recommends nodejs
# Install NPM dependencies
...
On the other hand, it’s not logical to have separate steps for updating or installing system packages. This might result in installing outdated packages when you rebuild the image. So, let’s combine them into a single step to avoid this issue.
...
- RUN apt-get update
- RUN apt-get install -y --no-install-recommends imagemagick curl software-properties-common gnupg
+ RUN apt-get update && apt-get -y install --no-install-recommends imagemagick curl software-properties-common gnupg
- RUN curl -sL https://deb.nodesource.com/setup_10.x | bash -
- RUN apt-get -y install --no-install-recommends nodejs
+ RUN curl -sL https://deb.nodesource.com/setup_10.x | bash - && apt-get -y install --no-install-recommends nodejs
# Install NPM dependencies
...
Finally, remove the package manager cache to reduce the image size:
...
RUN apt-get update && apt-get -y install --no-install-recommends imagemagick curl software-properties-common gnupg
- RUN curl -sL https://deb.nodesource.com/setup_10.x | bash - && apt-get -y install --no-install-recommends nodejs
+ RUN curl -sL https://deb.nodesource.com/setup_10.x | bash - && apt-get -y install --no-install-recommends nodejs && rm -rf /var/lib/apt/lists/*
# Install NPM dependencies
...
If you rebuild the image again…
$ docker build . -t express-image:0.0.3
… The image was reduced to 340MB!! That’s almost half of its original size.
Minideb is a minimalist Debian-based image built specifically to be used as a base image for containers. To significantly reduce the image size, use it as the base image.
- FROM debian
+ FROM bitnami/minideb
# Install required system packages
...
Minideb includes a command called install_packages
that:
apt-get
instructions fail.Replace the apt-get
instructions with the command as follows:
...
# Install required system packages
- RUN apt-get update && apt-get -y install --no-install-recommends imagemagick curl software-properties-common gnupg
+ RUN install_packages imagemagick curl software-properties-common gnupg
- RUN curl -sL https://deb.nodesource.com/setup_10.x | bash - && apt-get -y install --no-install-recommends nodejs && rm -rf /var/lib/apt/lists/*
+ RUN curl -sL https://deb.nodesource.com/setup_10.x | bash - && install_packages nodejs
# Copy application files
...
Build the image again:
$ docker build . -t express-image:0.0.4
As you can see, you saved 63MB more. The image size is now 277MB
Using Bitnami-maintained images gives you some benefits:
Instead of installing the system packages you need to run the application (Node.js in this case), use the bitnami/node
image:
- FROM bitnami/minideb
+ FROM bitnami/node
- # Install required system packages
- RUN install_packages imagemagick curl software-properties-common gnupg
- RUN curl -sL https://deb.nodesource.com/setup_10.x | bash - && install_packages nodejs
# Copy application files
...
Maintained images usually have different tags, used to specify their different flavors. For instance, the bitnami/node
image is built for different Node.js versions and it has a prod
flavor which includes the minimal needed packages to run a Node application (see Supported Tags).
Following this example, imagine that the application is requesting node >= 10
in the package.json. Therefore, you should use the 10-prod
tag to ensure that you are using Node.js 10 with the minimal packages:
- FROM bitnami/node
+ FROM bitnami/node:10-prod
# Copy application files
...
Once you add that tag, rebuild the image again:
$ docker build . -t express-image:0.0.5
These are the results: 48MB have been saved since the image size is now 229MB. With this crucial adjustment, concerns about system packages are no longer necessary.
Look at the current Dockerfile (after applying the improvements above) to see the following:
FROM bitnami/node:10-prod
# Copy application files
COPY package.json server.js /app
# Install NPM dependencies
RUN npm install --prefix /app
EXPOSE 80
CMD ["npm", "start", "--prefix", "/app"]
The current status of the sample Dockerfile shows two kinds of identifiable build steps:
To continue improving the efficiency and size of the image, split the build process into different stages. That way, the final image will be as simple as possible.
Using multi-stage builds is good practice to only copy the artifacts needed in the final image. Let’s see how to do it in this example:
FROM bitnami/node:10 AS builder
COPY package.json server.js /app
RUN npm install --prefix /app
FROM bitnami/node:10-prod
COPY --from=builder /app/package.json /app/server.js /app
COPY --from=builder /app/node_modules /app/node_modules
EXPOSE 80
CMD ["node", "/app/server.js"]
This is a short summary of the steps performed:
Using bitnami/node:10
to build our application, I added AS builder
to name our first stage “builder”. Then, I used COPY --from=builder
to copy files from that stage. That way, the artifacts copied are only those needed to run the minimal image bitnami/node:10-prod
.
This approach is extremely effective when building images for compiled applications. In the example below, I have made some tweaks to dramatically decrease the image size. The sample image is the one that builds Kubeapps Tiller Proxy, one of the core components of Kubeapps:
ARG VERSION
FROM bitnami/minideb:stretch AS builder
RUN install_packages ca-certificates curl git
RUN curl https://dl.google.com/go/go1.11.4.linux-amd64.tar.gz | tar -xzf - -C /usr/local
ENV PATH="/usr/local/go/bin:$PATH" CGO_ENABLED=0
RUN go get -u github.com/golang/glog && go get -u github.com/kubeapps/kubeapps/cmd/tiller-proxy
RUN go build -a -installsuffix cgo -ldflags "-X main.version=$VERSION" github.com/kubeapps/kubeapps/cmd/tiller-proxy
FROM scratch
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
COPY --from=builder /tiller-proxy /proxy
EXPOSE 80
CMD ["/proxy"]
The final image uses scratch
(which indicates that the next command in the Dockerfile is the first filesystem layer in the image) and it contains only what we need: the binary and the SSL certificates.
NoteUse
ARG
and--build-arg K=V
to modify your builds from the command line.
Build the image using the command:
$ docker build . -t tiller-proxy-example --build-arg VERSION=1.0.0
The final image size is only 37.7MB!! If you include both building and running instructions in the same image, the image size will be > 800MB.
Reuse those artifacts built on the builder
stage to create platform-specific images. For instance, following the Kubeapps Tiller Proxy example, use a Dockerfile to create different images for different platforms. In the Dockerfile below, Debian Stretch and Oracle Linux 7 are the platforms specified for the build:
...
FROM oraclelinux:7-slim AS target-oraclelinux
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
COPY --from=builder /tiller-proxy /proxy
EXPOSE 80
CMD ["/proxy"]
FROM bitnami/minideb:stretch AS target-debian
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
COPY --from=builder /tiller-proxy /proxy
EXPOSE 80
CMD ["/proxy"]
In the build commands, just add the --target X
flag to indicate which platform you want to build the image for:
$ docker build . -t tiller-proxy-example:debian --target target-debian --build-arg VERSION=1.0.0
$ docker build . -t tiller-proxy-example:oracle --target target-oraclelinux --build-arg VERSION=1.0.0
Using a single Dockerfile, you built images for two different platforms, while keeping the build process very simple.
Running containers such as non-root is one of the most popular best practices for security.
This approach prevents malicious code from gaining permissions in the container host. It also allows running containers on Kubernetes distributions that don’t allow running containers as root, such as OpenShift. For more information about the reasons to use a non-root container, check these blog posts:
To convert the Docker image into a non-root container, change the default user from root
to nonroot
:
...
EXPOSE 80
+ useradd -r -u 1001 -g nonroot root
+ USER nonroot
CMD ["node", "/app/server.js"]
...
NoteAdd the
nonroot
user to theroot
group.
Take these details into consideration when moving a container to non-root:
NoteIt is important to understand that you should not move a container to a non-root approach and then use
sudo
to gain higher-lever privileges, as this defeats the purpose of using a non-root approach. Similarly, you should also ensure that the non-root user account is not part of the sudoers group, to maximize security and avoid any risk of it obtaining root privileges.
Our sample application uses port 80
to listen for connections. Adapt it to use an alternative port such as 8080
:
Dockerfile:
...
COPY --from=builder /tiller-proxy /proxy
- EXPOSE 80
+ EXPOSE 8080
RUN useradd -r -u 1001 -g root nonroot
...
server.js:
...
const serverHost = '127.0.0.1';
- const serverPort = 80;
+ const serverPort = 8080;
...
On the other hand, the application writes its log in the /var/log/app.log file. Give permissions to the nonroot
user on that directory:
...
RUN useradd -r -u 1001 -g root nonroot
EXPOSE 80
+ RUN chmod -R g+rwX /var/log
USER nonroot
...
Test it:
$ docker build . -t express-image:0.0.7
$ docker run --rm -p 8080:8080 -d express-image:0.0.7
$ curl http://127.0.0.1:8080
Hello world
$ docker exec express-app whoami
nonroot
$ docker stop express-app
As you can see, everything is working as expected and now your container is not running as root
anymore.
The default value for the working directory is /
. However, unless you use FROM scratch
images, it is likely that the base image you are using set it. It is a good practice to set the WORKDIR
instruction to adapt it to your application characteristics.
Our application code is under the directory /app
. Therefore, it makes sense to adapt the working directory to it:
...
USER nonroot
+ WORKDIR /app
- CMD ["node", "/app/server.js"]
+ CMD ["node", "server.js"]
...
NoteUsing absolute paths to set this instruction is recommended.
When running your container on Kubernetes, chances are that you want to import your configuration from configMaps
or secrets
resources. To use these kinds of resources, mount them as configuration files in the container filesystem. Then, adapt your application so it reads the settings from those configuration files.
Using the VOLUME
instruction to create mount points is strongly recommended. Docker marks these mount points as “holding externally mounted volumes”, so the host or other containers know what data is exposed.
Let’s modify our application so the hostname and port are retrieved from a configuration file. Follow these steps:
In the server.js file, make the following changes:
...
// Constants
- const serverHost = '127.0.0.1';
- const serverPort = 8080;
+ const settings = require('/settings/settings.json');
+ const serverHost = settings.host;
+ const serverPort = settings.port;
...
Create the settings.json file as shown below:
$ mkdir settings && cat > settings/settings.json<<'EOF'
{
"host": "127.0.0.1",
"port": "8080"
}
EOF
Add the mount to point to Dockerfile:
...
EXPOSE 8080
+ VOLUME /settings
RUN useradd -r -u 1001 -g root nonroot
...
At this point, rebuild the image, and mount its configuration settings as shown below:
$ docker build . -t express-image:0.0.8
$ docker run -v $(pwd)/settings:/settings --rm -p 8080:8080 -d --name express-app express-image:0.0.8
The applications should redirect their logs to stdout/stderr so the host can collect them.
On distributions like Kubernetes, it is very common to have a logging system (such as ELK) that collects logs from every container so they’re available for the sysadmins. Making the logs available for the host to collect is mandatory for these kinds of solutions.
Our application writes its log in the /var/log/app.log file. Redirect the logs to stdout using the workaround below:
...
VOLUME /settings
+ RUN ln -sf /dev/stdout /var/log/app.log
RUN useradd -r -u 1001 -g root nonroot
...
With that change, execute the following commands to check that Docker correctly retrieved the logs:
$ docker build . -t express-image:0.0.9
$ docker run -v $(pwd)/settings:/settings --rm -p 8080:8080 -d --name express-app express-image:0.0.9
$ docker logs express-app
Running on http://127.0.0.1:8080
To make the container more flexible, set an entrypoint to act as the main command of the image. Then, use the CMD
instruction to specify the arguments/flags of the command:
...
- CMD ["node", "server.js"]
+ ENTRYPOINT ["node"]
+ CMD ["server.js"]
This way, you can modify the container behavior depending on the arguments used to run it. For instance, use the command below to maintain the original behavior:
$ docker build . -t express-image:0.0.10
$ docker run -v $(pwd)/settings:/settings --rm -p 8080:8080 -d --name express-app express-image:0.0.10
Or use the command below to check the code syntax:
$ docker run --rm express-image:0.0.10 --check server.js
You can always rewrite the entrypoint using the --entrypoint
flag. For instance, to check the files available at /app
, run:
$ docker run --rm --entrypoint "/bin/ls" express-image:0.0.10 -l /app
total 12
drwxr-xr-x 51 root root 4096 Jan 24 12:45 node_modules
-rw-r--r-- 1 root root 301 Jan 24 10:11 package.json
-rw-r--r-- 1 root root 542 Jan 24 12:43 server.js
When an application requires initializing, use a script as your entrypoint. Find an example of the one used on bitnami/redis
image here.
The example shown in this guide did not need any credentials, but the secure storage of credentials is an important consideration when writing Dockerfiles.
It is considered bad security practice to store sensitive information, such as login credentials or API tokens, as plaintext in a Dockerfile. A better approach, especially for containers that will run on Kubernetes, is to encrypt this sensitive information in a Kubernetes SealedSecret. SealedSecrets can be safely stored in public repositories and can only be decrypted by the Kubernetes controller running in the target cluster.
Refer to the SealedSecrets documentation for more information.
The intention of this blog post was to show you how to improve a Dockerfile in order to build containers in a more effective and faster way.
To demonstrate how to implement some changes on a given Dockerfile, I used an example with several defects that would be corrected by applying these good practices. The initial Dockerfile had the following issues:
Upon implementing these minor adjustments, the resulting Dockerfile is ready to be used to build containers for production environments.
Apart from these steps to write the best Dockerfiles for your production containers, here are few more tips to be proficient at builiding containers.