Why does size matter? Docker images are a core component in our development and production lifecycles. Having a large image can make every step of the process slow and tedious. Size affects how long it takes to build the image locally, on CI/CD, or in production and it affects how quickly we can spin up new instances that run our code. Reducing the size of the image can have benefits both for developers and your users.
So, what can you do about it?
1. Pick an appropriate base image
There are several important considerations that go into picking a base image. In the context of optimizing image size, each base image comes with its own dependencies and footprint.
Usually, the first choice you need to make is which distribution you want. Image sizes vary between them:
It’s not just a matter of image size though, each of these images comes with its own philosophy or tools you might prefer. Alpine is lightweight, security-focused, and based on musl-libc instead of glibc. Ubuntu has long-term enterprise support, comes bundled with many utilities and supports a vast amount of packages, and so on.
Next, you can decide if you want your parent image to come bundled with additional dependencies. Often you need to weigh the convenience of having a base image with all dependencies pre-installed against the size of the resulting image.
For example, if you have a Node.js app you can use the
node image, or
python if you’re using Python, etc. Within those images, usually you can specify the distribution you’d like using the appropriate tag, for example,
node:alpine for Alpine Linux and
python:3-bullseye for Debian Bullseye.
The less specific or specialized your parent image is, the more control you have over the image size:
# Image size: 934 MBFROM node:16-bullseyeCMD ["node"]
# Image size: 313 MBRUN apt-get update && \apt-get install -y curl && \(curl -sL https://deb.nodesource.com/setup_16.x | \bash -) && \apt-get install -y nodejsCMD ["node"]
A closer look at
node:16-bullseye shows that it has
buildpack-deps as its parent image, which comes with lots of dependencies you might not need. So if you’re willing to take care of the Node.js installation, you can do it directly on the Debian image and reduce the image size considerably.
2. Add only files you need
Docker makes it especially easy to add files you didn’t mean to add to an image. Each
COPY and even the
RUN commands in your Dockerfile can include files you weren’t expecting.
FROM node:16-bullseyeRUN apt-get install -y nodejsCOPY . .RUN npm installCMD ["node"]
It isn't easy to see exactly which files are added and where. So the first step is to be able to quickly inspect which files are added to each layer. Each layer corresponds to specific commands in your Dockerfile, and from there we can decide what and how to optimize.
There are 3 easy methods you can use:
You can save any local image as a
tar archive and then inspect its contents.
bash-3.2$ docker save <image-digest> -o image.tarbash-3.2$ tar -xf image.tar -C imagebash-3.2$ cd imagebash-3.2$ tar -xf <layer-digest>/layer.tarbash-3.2$ lsetctmpusrvar
An excellent open-source tool to visualize and analyze local Docker images.
Our contains.dev offers many tools to analyze layers, their contents, and their size. Including navigating a treemap of your image:
With these methods, you're set up to assess improvements to your image size. There are a few common areas that have straightforward solutions that improve the overall image size:
An important way to ensure you’re not bringing in unintended files is to define a
.dockerignore file. This file has a similar syntax to
# Ignore git and caches.git.cache# Ignore logslogs# Ignore secrets.env# Ignore installed dependenciesnode_modules...
Then when you run
COPY . . it’ll make sure not to copy files defined in your
.dockerignore. Defining this file has the added benefit of reducing the size of the Docker build context, which are all the files Docker gathers when building an image. A smaller build context results in faster build times.
Depending on the package manager you’re using, you can instruct it to install the minimum needed packages you explicitly defined.
apt-get -y --no-install-recommends- don’t install optional recommended packages.
npm install --production- don’t install development dependencies.
Many processes will create temporary files, caches, and other files that have no benefit to your specific use case. For example, running
apt-get update will update internal files that you don’t need to persist because you’ve already installed all the packages you need. So we can add
rm -rf /var/lib/apt/lists/* as part of the same layer to remove those (removing them with a separate
RUN will keep them in the original layer, see “Avoid duplicating files”). Docker recognize this is an issue and went as far as adding
apt-get clean automatically for their official Debian and Ubuntu images.
Each layer in your image might have a leaner version that is sufficient for your needs. The best way to see that is to audit your layers with the techniques mentioned above.
3. Avoid duplicating files
Docker uses read-only layers of files that are overlaid on top of each other. This means that when you make changes to files that come from previous layers, they’re copied into the new layer you’re creating. It isn’t always obvious that this is happening, for example:
COPY somefile.txt .RUN chmod 777 somefile.txt
chmod'ing an existing file, but Docker can’t change the file in its original layer, so that results in a new layer where the file is copied in its entirety with the new permissions.
In newer versions of Docker, this can now be written as the following to avoid this issue using Docker’s BuildKit:
COPY --chmod=777 somefile.txt .
Other non-intuitive cases of file duplication between layers:
COPY somefile.txt . #1# Small change but entire file is copiedRUN echo "more data" >> somefile.txt #2# File moved but layer now contains an entire copy of the fileRUN mv somefile.txt somefile2.txt #3# File won't exist in this layer,# but it still takes up space in the previous ones.RUN rm somefile2.txt
In this example, we created 3 copies of our file throughout different layers of the image. Despite removing the file in the last layer, the image still contains the file in other layers which contributes to the overall size of the image.
Making a small change to a file or moving it will create an entire copy of the file. Deleting a file will only hide it from the final image, but it will still exist in its original layer, taking up space. This is all a result of how images are structured as a series of read-only layers. This provides reusability of layers and efficiencies with regards to how images are stored and executed. But this also means we need to be aware of the underlying structure and take it into account when we create our Dockerfile.
4. Multi-stage builds
For cases where we have Dockerfile steps that aren’t used at runtime.
The Dockerfile might include several steps that take care of setting up an environment for compiling the program that will run at runtime. This is especially common for compiled languages like Go.
To solve this issue Docker introduced multi-stage builds starting from Docker Engine v17.05. This allows us to perform all preparations steps as before, but then copy only the essential files or output from these steps.
As shown in the example below, the effects on image size can be dramatic:
# Image size: 961 MBFROM golang:1.17.5WORKDIR /workspaceCOPY . .RUN go get && go build -o main .CMD ["/workspace/main"]
# Image size: 7 MBFROM golang:1.17.5 as builderWORKDIR /workspaceCOPY . .ENV CGO_ENABLED=0RUN go get && go build -o main .FROM scratchWORKDIR /workspaceCOPY --from=builder \/workspace/main \/workspace/mainCMD ["/workspace/main"]
This basic example compiles a simple Go program. The naive way on the left results in a 961 MB image. When using a multi-stage build, we copy just the compiled binary which results in a 7 MB image. The example on the left can be improved by choosing a leaner parent image, but it still would fall short of the optimal case on the right.
Multi-stage builds introduce a lot of flexibility with support for advanced cases like multiple
FROM statements, copying a single file from an external image, and more. These techniques can be combined to reduce the image size to a minimum. Check out the official Docker docs for more info.
Keeping your image optimized and small pays huge dividends in the development process and in going to production. The techniques above will help you gain a good understanding of what's going on inside your image, which has benefits beyond the optimization work.