Announcing remote cache support in Amazon ECR for BuildKit clients

This feature will be pre-installed and supported by Docker when version 25.0 is released. This feature is already released in Buildkit versions of 0.12 or later and is available now on Finch versions 0.8 or later.

Introduction

Amazon Elastic Container Registry (Amazon ECR) is a fully managed container registry that customers use to store, share, and deploy container images and artifacts. Amazon ECR is used by hundreds of thousands of customers as part of their build and deploy pipelines, both in AWS and non-AWS environments.

The most frequent way in which customers work with Amazon ECR and other registries is with a container client, which is in charge of packaging the container image and storing it in the registry. Many of these clients use a common open-source image building toolkit, known as BuildKit, to do this. This includes clients like Docker (as of 23.0), Finch, and Earthly. At build time, clients build each of the layers of the container image one by one, until the image is complete. Because some of the layers don’t often change between builds, clients store a local copy of all the layers they build, and then reuse this local cache in subsequent builds. This is much faster than rebuilding the layer every time. However, this only works if your build runners or workers are running in the same compute environment for each build run. Popular CI/CD platforms like GitHub Actions or GitLab CI Enterprise use ephemeral compute every time they run a build, and so they are unable to build up and use a local cache.

Solution overview

To allow ephemeral compute tools to also use a cache, BuildKit introduced the ability to export remote caches in 2020. These remote caches work similarly to local layer caches (i.e., the layer is stored and used instead of rebuilding it from scratch). The only difference is that each built layer is sent to the remote registry, and is pulled on subsequent builds if not found in a local cache. AWS customers have wanted to use this feature to speed up their image builds, and have asked us to support it on Amazon ECR. However, the format used for storing these caches was not an Open Containers Initiative (OCI) type. Amazon ECR is an OCI-compliant registry, which means that pushing this remote cache format to Amazon ECR resulted in a validation failure.

Without remote cache, the Docker client needs to build ever layer from scratch because there is no local cache. With remove cache, the Docker client finds the remote cache and uses it instead of building from scratch.

BuildKit has recently released version 0.12, which includes a contribution by Amazon ECR engineering for a solution that allows for a remote build cache to be generated and stored in an OCI-compatible way. This means that BuildKit stores and retrieves the build cache in registries that implement the OCI specification, like Amazon ECR. With this update, you can push a cache image to an Amazon ECR repository separately from the built and pushed image. This cache image can then be referenced in future builds to provide significant speedups to your time-to-push, whether you’re just pushing from your laptop or from your production CI/CD builds on platforms like GitLab or GitHub Actions.

Walkthrough

Getting started with remote cache in Amazon ECR

In these examples we’ll use Docker. Ensure you’re running version 25.0.0 or later, which includes BuildKit 0.12 and the changes required to work with Amazon ECR and other OCI-compatible registries. These examples can be run on your local development environment, or used in your CI/CD platform build scripts.

For example, we have a build step in our CI/CD that uses Docker to build an image locally, and then pushing the built image to Amazon ECR. It might look like this:

docker build -t <account-id>.dkr.ecr.<aws-region>.amazonaws.com/buildkit-test:image .

docker push <account-id>.dkr.ecr.<aws-region>.amazonaws.com/buildkit-test:image

To ensure your build will populate a cache in Amazon ECR, and then use it on subsequent builds, you can simply the –cache-to and –cache-from options to the build command.

docker build -t <account-id>.dkr.ecr.<my-region>.amazonaws.com/buildkit-test:image \
--cache-to mode=max,image-manifest=true,oci-mediatypes=true,type=registry,ref=<account-id>.dkr.ecr.<my-region>.amazonaws.com/buildkit-test:cache \
--cache-from type=registry,ref=<account-id>.dkr.ecr.<my-region>.amazonaws.com/buildkit-test:cache .

docker push <account-id>.dkr.ecr.<my-region>.amazonaws.com/buildkit-test:image

That looks like a lot of details. Let’s break down what we’ve just added.

We now have two new options set called cache-to and cache-from. The cache-to option specifies the remote cache you are exporting to (or creating), based on the image URI specified using the argument’s context key called ref. Note that the image URI here is different from the actual tagged image being built. If you prefer, you could even set your cache’s URI to point to a different repository in Amazon ECR, though that is not necessary. The type context key with value registry means that we are creating a remote cache to be pushed to a registry. The new context key introduced in Buildkit 0.12 here is image-manifest. Setting this key’s value to true lets you now store an OCI-compatible version of a remote cache in the registry. We also setting oci-mediatypes to true since that’s required in order to use image-manifest. Note that we didn’t have to add another push command to our build step because cache-to implicitly pushes your remote cache via the image URI you specify in ref.

The cache-from option specifies a cache location that BuildKit can pull from instead of performing a given build step in your Dockerfile (provided nothing has changed about that layer and preceding layers). In this case, that’s going to come from your Amazon ECR repository with the cache manifest URI you’ve just specified in cache-to.

To summarize, we can think of cache-to as forward-looking, by exporting a remote cache manifest that speeds up future builds, and cache-from takes advantage of remote cache manifests once exported in order to speed up the current build. You can introduce both at the same time to your new build step because cache-from will give up and not use a cache if the cache doesn’t exist initially. With this one change, every time layers change, a new build updates the remote cache, and every build always uses the most current cache.

Let’s run this build and go to the AWS Console to see what was pushed to Amazon ECR.

Amazon ECR console showing the cache stored as an artifact of type "other"

We can see that we have our image and now our cache. On subsequent builds, we’ll see a nice speed up to the image build step, especially for builds that follow Dockerfile caching best practices.

While you should refer to your CI/CD provider’s documentation for how to update your CI/CD solutions to grab the latest version of Docker and/or BuildKit. Here are some ideas for popular CI/CD solutions:

For GitLab CI, you should be able to simply change your tag for your docker image of your GitLab Runner to the newest version of dind.
For GitHub Actions, ensure that you’re using the setup-buildx-action with version set to latest or a version string of 0.12 or later.
For Travis CI, you can update your Travis installation here.
For CircleCI, you should be able to change your version in setup_remote_docker in your yml to the latest supported version shown here.

Conclusion

Using the solution described in this blog post you can speed up your container builds by storing a remote build cache in Amazon ECR. Our tenet is to come up with solutions that are not only right for AWS customers and suitable for the open-source community and standards like the OCI. Here at Amazon ECR, we believe that building support into BuildKit creates a more open and compatible remote caching solution has made good on that tenet. Try out remote caches for yourself today in all of your CI/CD build pipelines and get the benefits of faster and more consistent image build times!

Containers