AWS Open Source Blog
How the Bottlerocket build system works
Bottlerocket is an open source, special-purpose operating system designed for hosting Linux containers, which was launched in 2020.
As I delved into the Bottlerocket build system for a deeper understanding, I found it helpful to describe the system in detail (a form of rubber-duck debugging). This article is the result of my exploration and will help you understand how the Bottlerocket build system uses Cargo, RPM, and Docker to create a Bottlerocket image.
Cargo and build.rs
The Bottlerocket build is unique in that it uses Cargo, the Rust package manager and build orchestrator, to drive all build processes, even those that are unrelated to Cargo packages. Because Bottlerocket’s first-party code is written in Rust, the choice to drive all build processes from Cargo gives developers a single, familiar interface. Cargo offers many great features, including dependency graph resolution and a flexible manifest format. Another feature, build.rs
, allows us to extend Cargo to manage Bottlerocket’s entire build process.
Any Cargo project can include a file in the same directory as its Cargo.toml
named build.rs
. The program defined by build.rs
will be compiled and executed prior to building the Cargo package on which cargo build
was invoked. This feature can be used for whatever you want. Common use cases include compiling C code that is linked by the rust project, generating Rust code files, etc. See the build scripts section of the Cargo Book for more details.
Bottlerocket takes advantage of this functionality to run docker build
commands to invoke other build systems. For example, the packages in the packages
directory are not typical Cargo packages. Each subdirectory of packages
is a Rust crate that has an empty pkg.rs
file. When we compile one of these packages, we produce an empty lib. But, each of these directories also includes a build.rs
file, which uses the Bottlerocket SDK to compile the actual package when build.rs
is invoked.
For a more concrete example, when we compile the kernel
package, the main function in build.rs
is invoked:
This function is running buildsys
, which runs rpmbuild
on kernel.spec
inside of a docker build
context, using the Bottlerocket SDK. Ultimately, this process results in the kernel compiling in the normal manner (make, gcc, among others).
RPM spec
An RPM .spec
file is like a script for creating one or more RPM packages. It essentially boils down to a shell with macros and some built-in functionality that is useful for Linux package distributors. The typical workflow of a .spec
file is:
- Obtain some code repository.
- Apply Git patches.
- Build the package with
make
orgo
or whatever. - Specify which build artifacts will be installed, and where, when the RPM package is installed.
- Specify which other RPM packages are dependencies when building or installing this package.
Bottlerocket does not have a package manager (such as yum or dnf). We use RPM to package everything that will go into a Bottlerocket image, then “install” the RPMs when building the image. We do not distribute RPM packages in the way that other Linux distributions do.
Bottlerocket SDK
The SDK is a container image including all the necessary tools and toolchains for a Bottlerocket build. These are all securely built from source using a lookaside cache to ensure stable, reproducible builds. Among the things included in the SDK are:
- rpmdevtools/rpmbuild
- buildroot
- GNU toolchain and glibc built from lookaside cache sources
- musl built from lookaside cache sources
- llvm and libunwind for building Rust
- Rust compiler built from source with a custom target, named bottlerocket, built for both glibc and musl
- Go built from lookaside cache source, with and without pie
Cargo Make
Cargo Make serves the same purpose as Make, but is installed with, and driven by the cargo command, cargo make
. The syntax of the makefile is toml (Makefile.toml
). All you need to know for now is that cargo make
can be used like make. Our default target is build
, which has an alias named world
. Thus, cargo make
, cargo make world
, and cargo make build
all do the same thing: build a Bottlerocket variant!
Docker and buildsys
You may have noticed that the build.rs
main function pasted above is shelling out to a program called buildsys
. This program is also in the Bottlerocket tree and is responsible for calling docker build
to use the SDK.
Buildsys has two subcommands: build-package
and build-variant
. Both of these invoke a docker build
command where the Dockerfile being built is at the root of the Bottlerocket repo (same Dockerfile for both subcommands). Note that build-package
invokes docker build
with a --target package
argument, and build-variant
invokes docker build
with a --target variant
argument.
After the docker build
command, buildsys uses docker create
to create a container from the built image. This step is necessary because files cannot be copied out of a Docker image. Next, docker cp
copies the desired build artifacts from the container to local disk. By convention, in the container, build artifacts are found at /output
and copied to the gitignored build
directory at the root of the Bottlerocket repo. Finally, docker rm
deletes the container and docker rmi
untags the image (without pruning image layers).
A key takeaway here is that buildsys doesn’t use a docker run
command. Instead, it builds container images and relies on (and controls) Docker image layer caching to prevent rebuilding unchanged sources.
Cargo.toml
Buildsys relies on the fact that it has been invoked by Cargo’s encounter with build.rs
. Thus, buildsys knows that it is running in the same directory as the Cargo package to be built and can find the Cargo manifest at ./Cargo.toml
. Buildsys uses package.metadata
for additional information about the thing to be built. A Cargo manifest for a third-party package in Bottlerocket looks like this:
Note that we use the [build-dependencies]
section to point to other packages that need to be compiled before this one. (You will see the same dependency specified in the RPM spec file.) In this way, we use Cargo’s build graph to ensure that the glibc RPM exists before the kubernetes-1.15
package is built. That way, when the RPM build needs to install glibc before it can compile Kubernetes, we know that RPM package will be there.
buildsys build-package
The buildsys build-package
command builds a package from the packages
directory. Buildsys cursorily parses the RPM spec file to find out which patch and source files are needed by the build so that it can tell Cargo to watch for changes to these files. It downloads that package’s source tarball from the lookaside cache (falling back to upstream if instructed to do so) and checks the hash digest of the file (both the URL and digest are found in cargo.toml). The source tarball is saved in the package directory (*.tar.gz is gitignored). The sourcecode, patches, RPM spec, etc. are copied into the Docker image by copying the entire package directory.
buildsys build-variant
The build-variant
command creates a Bottlerocket image (and an archive of all migrations). It relies on the fact that all the necessary RPMs have been built and stored locally in build/rpms/
(which has been accomplished by many buildsys build-package
calls). The required RPMs are “installed” into a disk image that will become the Bottlerocket rootfs. The Docker call works out to something like the excerpt below. (See the Variants section for more.)
NOCACHE and TOKEN
NOCACHE and TOKEN are both used to control docker build
’s image layer caching behavior. NOCACHE is used to “dirty” layers that should be rebuilt on every invocation. TOKEN is used to differentiate between two checkouts of Bottlerocket. That is, if you have more than one directory containing the Bottlerocket git repo, and you want to build them simultaneously, then TOKEN (a hash of the repo path) will differentiate the Docker image layers as needed.
Dockerfile
The Dockerfile at the root of the Bottlerocket repo is used during the build process. We do not ever run the images that are created with the Dockerfile. Instead, docker build
commands produce artifacts in the Docker images, which we then copy out with docker cp
.
We are using Docker Buildkit features, particular commands such as RUN --mount
to access the Bottlerocket repo (without copying it into the image) and RUN --mount,type=cache
to preserve state from one build run to the next (so we don’t rebuild the world every time).
There are two logical sections to the Dockerfile. The first section is used by buildsys build-package
, which creates RPM packages that are copied out to local /build/rpms
. This section of the Dockerfile is used many times during a build (one docker build --target package
call for each package in the packages
directory).
The second section of the Dockerfile is used by the buildsys build-variant
command. This is done at the end of a Bottlerocket build, after all of the docker build --target package
calls are done, and all of the RPMs are available at local build/rpms
. This section of the Dockerfile creates Bottlerocket disk images using rpm2img.
rpm2img
rpm2img
is a script in the tools
directory. This is used by buildsys build-variant
(in Section 2 of the Dockerfile) to create Bottlerocket disk images from all the RPMs. Things like partition schemes, filesystem creation, partition guids, and verity hashing can be found in rpm2img.
rpm2migrations
Migrations are binaries that modify Bottlerocket’s datastore during an upgrade or downgrade from one version to another. (You can think of these as database migrations, although the Bottlerocket datastore is not a traditional database.) Migration binaries are not included in the Bottlerocket disk images. Instead, they are securely downloaded from the TUF repo only if they are needed for an upgrade or downgrade.
The use of RPMs is a great way to prepare the Bottlerocket disk images, but the migrations don’t fit so well into that “installation” paradigm. Thus rpm2img
excludes migrations. A separate step uses rpm2migrations
, another script in the tools
directory (also used by buildsys build-variant
in Section 2 of the Dockerfile) to create a tarball of the datastore migration binaries. Because these are not added to the Bottlerocket image, they get “un-RPMed” and zipped so that pubsys can later find them and add them to the TUF repo.
docker-go
docker-go
is a script in the tools
directory that can be used to compile Go modules with the Bottlerocket SDK container. In this case, the SDK is used as a container, with docker run
(as opposed to as an image layer via buildsys). There are two invocations of docker-go
in Makefile.toml
. One is to download and cache the vendored Go dependencies. The other is to run host-ctr
unit tests.
The host-ctr
program pulls and controls host containers. These are not to be confused with the host container images themselves, which are not in the Bottlerocket repo and not part of the Bottlerocket build. Because host-ctr
is non-Rust, it has its own RPM spec package (separate from os.spec
, that is). Also, because it’s not Rust, we want to offer developers a way to run its unit tests without having a Go toolchain locally. So, docker-go
is used to run host-ctr
unit tests using the Bottlerocket SDK.
Special Package 1: os.spec
packages/os.spec
compiles all of our first-party Rust code and provides the installation instructions for these first-party packages along with any configs they need.
Special Package 2: release.spec
packages/release.spec
is special. It creates an RPM package that installs (for example, dnf install
) all of the packages and files that are commonly required by all Bottlerocket variants. The build section is empty. All of the commonly required packages are named with Requires:
statements, which causes those RPM packages to be installed when release.rpm
is installed. Additionally, release.spec
installs some configuration files.
Variants
A variant is a build of Bottlerocket that is differentiated (primarily) by the packages it includes and the settings in its settings model. This differentiation is created in a few places.
- Variant definition:
- The variant currently being built is defined by the
VARIANT
environment variable, which comes from theBUILDSYS_VARIANT
variable in Makefile.toml, which the user can override to build a different variant. - In the
variants
directory, an empty Cargo package specifies which packages are included by way of apackage.metadata.build-variant.included-packages
section in Cargo.toml.
- The variant currently being built is defined by the
- Conditional compilation:
- In the
model
crate, different settings are compiled based on theVARIANT
environment variable. This is done by way of symlinking to the desired model directory bybuild.rs
. Also included with the variant-specific settings is anoverride-defaults.toml
file, which allows for overriding settings specified indefaults.toml
or introducing defaults for settings that are variant-specific. - In a one-off case for the ECS variant, we have used
build.rs
to introduce a Cargocfg
with the variant name during the build ofecs-settings-applier
. This is because the program depends on ECS-specifics of the settings model, and the compilation thereof needs to be effectively skipped during non ECS-variant builds. (Note: this is only relevant during local cargo builds of the workspace. During the RPM build we skip the package altogether.) - Logdog also wants to conditionally compile based on variant and follows the precedent set by the
model
crate. That is, logdog uses symlinks based on variant name.
- In the
- RPM: In
os.spec
, we check the variant environment variable to conditionally skip the compilation and packaging of theecs-settings-applier
for non-ECS variants. (Note: This is why thecfg
described above is only needed for local workspace builds.)
Taking a closer look at a variant Cargo package, in variants/aws-ecs-1/Cargo.toml
, we find:
And, in variants/aws-ecs-1/build.rs
, we find:
The list of packages (from Cargo.toml package.metadata.build-variant.included-packages
) is passed to a docker build by way of a PACKAGES
environment variable. We can see it being used in the Dockerfile to copy only the specified packages. (Note that the dnf
command is --downloadonly
, so we are not installing these; we are effectively copying only the required RPMS based on $PACKAGES
and their dependencies.)
Later in the docker build, the rpm2img script is called, which has the following line in it, which finally installs the variant’s packages and their dependencies:
The final result of all of this is a Bottlerocket disk image.
Conclusion
We’ve seen how Bottlerocket uses the flexibility of Cargo to drive the creation of a Bottlerocket disk image. Not only does it drive the compilation of Rust code, but, through the use of Docker and RPM, it drives the complete packaging of a Bottlerocket system.