Bottlerocket, A Year in the Life

With the recent launch of Bottlerocket support for Managed Node Groups in Amazon Elastic Kubernetes Service (Amazon EKS), I wanted to take the opportunity to talk about Bottlerocket and its features. At a previous point in my career, I was one of many engineers working on a commercial UNIX operating system. Linux established itself as a viable option years before and subsequently snagged a substantial part of the market. As our customers slowly shifted toward Linux, we started to hear the question: “Does the OS even matter anymore?”, and our answer was always “Yes.” We passionately believed so, and I still do.

Today, we embrace containers as the new process model for our applications and depend upon orchestration abstractions to manage and run them. It’s pretty safe to say that developers shouldn’t be too concerned at a deep level with operating systems. We may say that the container orchestration layer has become the new operating system, right? Well, not really. Orchestration solutions like Kubernetes schedule work to nodes in a typical configuration, and those nodes are computers that run operating systems. Same compute model, just another layer of abstraction.

On AWS, you have the option of not even thinking about nodes and using AWS Fargate for compute resources with Amazon Elastic Container Service (Amazon ECS) and Amazon Elastic Kubernetes Service (Amazon EKS). This is by far the quickest way to set the ENOCARE bit and move on. But you may require discrete, static compute resources, in which case someone needs to care about the nodes and the software that they run.

In this context, we can ask a different question: “Do you need to worry about the OS anymore?” To that, we would like to say “No.”

What is Bottlerocket?

Bottlerocket is a free and open source Linux-based operating system expressly designed for hosting containers. While any general purpose Linux distribution can get the job done, they are designed to be generally usable for all kinds of workloads. This means more moving parts to support, more types of workloads out of the box, and a default security posture that’s more open to various configurations.

Bottlerocket leans into the opportunity to specialize and focus on hosting containers as its primary role, and in so doing its design is very opinionated toward that goal. The project’s charter defines clear tenets rooted in lessons learned over years of running production services at scale in Amazon. Bottlerocket only contains the software components needed to run containers securely and reliably. With fewer moving parts and processes running on your hosts, there’s less to manage and a smaller attack surface than a default general purpose host operating system.

Security and maintainability are first-class design principles of Bottlerocket. Bottlerocket is deployed on a read-only root file system, and its host configuration is ephemeral. Rather than remotely logging in with a shell and superuser privileges, configuration changes can be made either through a control container or ideally configuration changes are invoked via API. To dive a bit deeper with specific security aspects of Bottlerocket, check out this recent blog.

Updates with Bottlerocket are fast and resilient, atomically flipping to a new prepared partition rather than a more traditional modular package-based approach. When used as the host operating system for nodes in Amazon ECS, Amazon EKS, Amazon EKS Anywhere, or in self-managed Kubernetes clusters, automatic updates can be optionally enabled. All of these aspects lead to a more secure and manageable host OS for your container workloads, resulting in you worrying about the OS less, if at all.

All of these features reflect a ground-up cloud native approach to operating system design. Bottlerocket had its official launch at v1.0.0 last summer. In this blog, I’d like to walk through some of the major features introduced to the OS over the last year and changes to the project. I’ll also highlight some of the solutions offered by our AWS Partners.

A year in the life

A lot of work has gone into the Bottlerocket project since version 1.0 was announced over a year ago. From large features, to smaller but impactful usability fix-ups, and helpful curated variants, overall the OS has grown useful customization and expansion features while becoming easier to work with and manage. The project has also grown a community, and a large number of AWS Partners have created solutions around it.

As with many projects, the release notes are the reference for what specific changes have gone into each version.

Bottlerocket versions and variants

Before we get into looking at some specific Bottlerocket features, let’s talk about how versions and variants work. A goal of Bottlerocket is to deploy the operating system in as small a footprint as possible, only installing and running what’s required. If we consider Amazon ECS and Amazon EKS, cluster nodes require slightly different software.

While the host image could contain components to support both orchestration options, that would conflict with the design goals. For this reason, Bottlerocket uses variants in order to provide curated images that are purpose-built for each environment, containing only the components required. This design allows for different cloud environments and container orchestrator support in the future.

Variants exist for Amazon ECS, as well as versioned Kubernetes releases, and each is built to support its specific version of Kubernetes. These variants can be used with Amazon EKS nodes, EKS Anywhere, or with self-managed Kubernetes clusters. A variant to support VMware was recently released in preview, which includes the packages required to run a Kubernetes worker node as a VMware guest (also supported by EKS Anywhere).

The Bottlerocket OS itself also has a version. This identifies the project bits at a point in time that roll into a given image. The Bottlerocket OS is built at a given version for a given variant, which is basically a list of contents, and for a specific architecture. These various attributes come together to define the characteristics of a given installable image, for example bottlerocket-aws-k8s-1.21-x86_64-1.2.0-dafe3b16.img.

Updating the OS

Updates to Bottlerocket are image-based, atomic operations. When an update is initiated, the latest Bottlerocket image will be downloaded from a TUF repository to an inactive partition on the host as the system continues to operate normally. Once the download is complete, the system (specifically the Signpost utility) marks the update partition as active, while the previous active partition is marked as inactive.

The system is then rebooted into the new version. If a failure occurs during the boot process, the system can easily be rolled back to the previous version by swapping the active and passive partitions and rebooting. This image-based approach is a well-tested mechanism, making operating system updates more reliable and resilient.

From an API standpoint, the update process looks like this:

While the low-level API commands to perform the preceding actions still exist, Bottlerocket’s apiclient understands the update workflow and automates these calls.

The following apiclient commands can be used to upgrade the OS:

apiclient update check checks to see whether there is a new version of the installed variant
apiclient update apply downloads the update and verifies that it has been staged
apiclient reboot activates the update and reboots the system

You can combine these commands in an SSM document that can be executed against your nodes.Alternatively, you can automate updates with the Bottlerocket Update Operator on Kubernetes, or the Bottlerocket ECS Updater on Amazon ECS. Both versions of the Update Operator will safely drain workloads and update nodes in the cluster one at a time to minimize operational impact.

TLS rather than only AWS Auth

In Kubernetes, the kubelet is a node agent that communicates with the Kubernetes API. This channel should be private, trustworthy, and free of interference. This is accomplished by using TLS – but this presents a problem. How do you get the TLS client certificates onto the worker node? Kubernetes offers a certificate-signing API, which the kubelet uses to request a certificate from the cluster’s control plane. On Amazon EKS, Bottlerocket uses the aws-iam-authenticator to generate a token to authenticate to the API server and submit a CSR to the signing API. Assuming the request is authenticated, the signing API will automatically approve the request and return a certificate which the node uses to join the cluster. It will then communicate with the API over TLS.

There may be times, however, when you want to use Bottlerocket with self-managed Kubernetes clusters or clusters outside of AWS (self-managed, or with EKS Anywhere) where IAM authentication not an option. In circumstances like this, you can use the Bottlerocket API to change the settings.kubernetes.authentication-mode from aws to tls. This informs Bottlerocket to use a bootstrap token instead of AWS authentication.

Be aware that when you make this change you also need to supply a value for the settings.kubernetes.bootstrap-token variable. You can learn more about bootstrap tokens here.

HTTPS proxy configuration

In some environments, it is necessary to route web traffic through a proxy. With Bottlerocket, you can now configure the network proxy for a variety of different services that run on the server such as updog (the client for TUF), metricdog (sends health information about the host), Docker, containerd, the Amazon ECS agent, and the kubelet.

To configure a proxy, simply set the settings.network.http-proxy to the URL and port of proxy server, e.g. http://192.168.1.192:3128. You can also use the settings.network.no-proxy setting to exclude a list of hosts from proxying. The Kubernetes variant automatically adds the API server and other Kubernetes suffixes to the no-proxy list, e.g. the cluster domain.

Custom CA certificates

Bottlerocket ships with the Mozilla CA certificate store by default, but you can also install your own. This allows you to curate and manage bundles of certificates, and manage their state of trust within the context of the OS API.

The following API settings allow you to add self-signed certificates in Bottlerocket:

settings.pki.<bundle-name>.data: Contains one or more certificates, as a base64-encoded bundle in PEM format.
settings.pki.<bundle-name>.trusted: Boolean that indicates if the certificates in the bundle are trusted, defaults to false.

This configuration can be passed directly via API calls, or added to user data. For example, in user data:

[settings.pki.my-trusted-bundle]
data="-----BEGIN.."
trusted=true

[settings.pki.untrusted-bundle]
data="-----BEGIN.."
trusted=false

Or, via the API client:

apiclient set \
  pki.my-trusted-bundle.data="-----BEGIN" \
  pki.my-trusted-bundle.trusted=true \
  pki.dont-trust-these.data="-----BEGIN" \
  pki.untrusted-bundle.trusted=false

Bootstrap containers

Bootstrap containers allow you to run a container that performs a task as the system boots. They are not meant to run indefinitely like control containers or admin containers. They are containers that can be used to “bootstrap” the host before other services start, solving for node configuration through Bottlerocket’s design of container-based customization.

When configuring a bootstrap container, you specify:

The image you want to use
The configuration of when the container should run (off, once, or always)
Whether the container is “essential.” For example, halt the bootstrap process if the bootstrap container fails.
Optional user-data, providing configuration data to the bootstrap container

Bootstrap containers run prior to services like Docker, the Amazon ECS agent, and the kubelet start but after the systemd configured.target unit is active. You can configure multiple bootstrap containers, but the order in which they run is not controlled.

Bootstrap containers can be used for a variety of purposes. For example, you can use them to partition and format ephemeral storage, to calculate and set the maximum number of pods that can run, or to configure additional Kubernetes labels. The world is your oyster.

Unlike host containers, bootstrap containers can’t be run as superpowered containers. However, they do run with the Linux capability CAP_SYS_ADMIN, which allows them to create files, directories, and mounts that are visible on the host.

To flesh this out a bit, let’s have a look at how you create a bootstrap container to partition and format ephemeral disks.

First, a Dockerfile for a bootstrap container image:

FROM alpine
RUN apk add e2fsprogs bash parted
ADD setup-ephemeral-disks ./
RUN chmod +x ./setup-ephemeral-disks
ENTRYPOINT ["sh", "setup-ephemeral-disks"]

where setup-ephemeral-disks is a script containing:

#!/usr/bin/env bash
set -ex

# The name of the disk we want to manage
DISK=/.bottlerocket/rootfs/dev/nvme2n1
# Sentry file to check if the disk was already partitioned and formatted
PARTITIONS_CREATED=/.bottlerocket/bootstrap-containers/current/created
# Mounts from this mount point will propagate accross mount namespaces
BASE_MOUNT_POINT=/.bottlerocket/rootfs/mnt

# If the disk hasn't been partitioned, create the partitions and format them
if [ ! -f $PARTITIONS_CREATED ]; then
parted -s $DISK mklabel gpt 1>/dev/null
parted -s $DISK mkpart primary ext4 0% 50% 1>/dev/null
parted -s $DISK mkpart primary ext4 50% 100% 1>/dev/null
mkfs.ext4 -F ${DISK}p1
mkfs.ext4 -F ${DISK}p2
# Create sentry file once the disk is partitioned and formatted
touch $PARTITIONS_CREATED
fi

# We make sure the target mount points exist
mkdir -p $BASE_MOUNT_POINT/part1
mkdir -p $BASE_MOUNT_POINT/part2

# Always mount the partitions
mount ${DISK}p1 $BASE_MOUNT_POINT/part1
mount ${DISK}p2 $BASE_MOUNT_POINT/part2

With this container image built and pushed to a registry endpoint, you can configure the bootstrap container via the Bottlerocket API, using the apiclient set command, and specifying the container image.

apiclient set \
  bootstrap-containers.bootstrap.source=<your image's URI> \
  bootstrap-containers.bootstrap.mode=always \
  bootstrap-containers.bootstrap.essential=false

Bootstrap containers can also be specified when creating node groups with the eksctl create nodegroup command.

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: <cluster_name>
  region: <aws_region>

nodeGroups:
  - name: <group_name>
    instanceType: m5ad.4xlarge
    desiredCapacity: 1
    amiFamily: Bottlerocket
    privateNetworking: true
    bottlerocket:
      enableAdminContainer: true
      settings:
        motd: "Bottlerocket rocks!"
        bootstrap-containers: 
          bootstrap: 
            source: "<image_uri>"
            mode: "always"
            essential: false
    ssh:
      # Enable ssh access (via the admin container)
      allow: true
      publicKeyName: <key_name>

To ensure that the bootstrap container did its work, connect to control container and run lsblk. The instance has two new devices: nvme2n1 and nvme3n1. You can see that nvme2n1 has been split into two partitions: part1 and part2.

[ec2-user@ip-192-168-156-26 ~]$ lsblk
NAME         MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
loop0          7:0    0   280K  1 loop /.bottlerocket/rootfs/x86_64-bottlerocket
loop1          7:1    0  11.6M  1 loop /.bottlerocket/rootfs/var/lib/kernel-deve
nvme0n1      259:0    0     2G  0 disk
├─nvme0n1p1  259:3    0     4M  0 part
├─nvme0n1p2  259:4    0    40M  0 part /.bottlerocket/rootfs/boot
├─nvme0n1p3  259:5    0   920M  0 part
├─nvme0n1p4  259:6    0    10M  0 part
├─nvme0n1p5  259:7    0    30M  0 part
├─nvme0n1p6  259:8    0    40M  0 part
├─nvme0n1p7  259:9    0   920M  0 part
├─nvme0n1p8  259:10   0    10M  0 part
├─nvme0n1p9  259:11   0    30M  0 part
└─nvme0n1p10 259:12   0    42M  0 part /.bottlerocket/rootfs/var/lib/bottlerocke
nvme2n1      259:1    0 279.4G  0 disk
├─nvme2n1p1  259:15   0 139.7G  0 part /.bottlerocket/rootfs/mnt/part1
└─nvme2n1p2  259:16   0 139.7G  0 part /.bottlerocket/rootfs/mnt/part2
nvme1n1      259:2    0    80G  0 disk
└─nvme1n1p1  259:14   0    80G  0 part /.bottlerocket/rootfs/local
nvme3n1      259:13   0 279.4G  0 disk

Static pods

Static pods are a way to run pods on a cluster node outside of Kubernetes API management. When you run a static pod on a node, the kubelet is responsible for monitoring its health and will restart the pod if it fails. The kubelet will also try to create a “shadow” pod on the API server for each static pod. While this makes static pods visible to the API server, they cannot be managed through the API.

Ordinarily static pods are created by copying a pod manifest to the directory specified by the --pod-manifest-path flag passed to the kubelet. There’s also an option to pull them from a web server. With the Bottlerocket API, not only can you configure the pod manifest path, you can also pass in a manifest as in the following example:

[settings.kubernetes.static-pods.my-pod]
manifest = "<BASE64 encoded pod manifest>"
enabled = true

kmod-kit

This feature of Bottlerocket allows you to build out-of-tree kernel modules for your images. As with bootstrap images, the point here is to solve for a customization use case, given you aren’t going to log into a node and install software. This allows you to build kernel modules into your image by way of leveraging the kmod-kit.

In order to build an out-of-tree kernel module for Bottlerocket, you use the kmod kit for the variant you are using. The following is an example using this feature to build the Falco driver along with a script to load it.

First, a multistage Dockerfile to build falco and then bundle it up in a deployable image.

FROM rust AS tuftool
RUN cargo install tuftool
FROM fedora:33 AS builder
WORKDIR /tmp
COPY --from=tuftool /usr/local/cargo/bin/tuftool /usr/local/bin/tuftool
# Install dependencies and download sources
RUN \
  ulimit -n 1024; dnf -y install \
  bc bzip2 cmake3 curl diffutils dwarves elfutils-devel \
  findutils gcc gcc-c++ git kmod make tar ncurses-devel \
  patch xz && \
  git clone https://github.com/falcosecurity/falco.git
# Download root.json to fetch artifacts from tuf repo
RUN curl -O "https://cache.bottlerocket.aws/root.json" && \
    echo "90393204232a1ad6b0a45528b1f7df1a3e37493b1e05b1c149f081849a292c8dafb4ea5f7ee17bcc664e35f66e37e4cfa4aae9de7a2a28aa31ae6ac3d9bea4d5  root.json" | sha512sum -c
FROM builder AS driver
ARG VARIANT="<VARIANT>"
ARG ARCH="<ARCH>"
ARG VERSION="<VERSION>"
ARG KIT="${VARIANT}-${ARCH}-kmod-kit-v${VERSION}"
ARG KERNELDIR="/tmp/${KIT}/kernel-devel"
ARG CROSS_COMPILE="${ARCH}-bottlerocket-linux-musl-"
ARG INSTALL_MOD_STRIP=1
RUN tuftool download . --root ./root.json \
      --target-name $KIT.tar.xz \
      --metadata-url "https://updates.bottlerocket.aws/2020-07-07/$VARIANT/$ARCH/" \
      --targets-url "https://updates.bottlerocket.aws/targets/"
RUN tar xf ${KIT}.tar.xz
RUN \
  export PATH="/tmp/${KIT}/toolchain/usr/bin:${PATH}" && \
  mkdir -p falco/build && \
  cd falco/build && \
  cmake3 -DUSE_BUNDLED_DEPS=ON .. && \
  make driver -j# Validate the kernel module was compiledRUN test -f /tmp/falco/build/driver/falco.ko
ADD ./load-driver /usr/bin/load-driver
ENTRYPOINT ["load-driver"]

Here, load-driver is a script containing:

#! /bin/bash -x

insmod /tmp/falco/build/driver/falco.ko
sleep infinity &
trap "echo 'Caught signal'; { kill $!; exit 0; }" HUP INT QUIT PIPE TERM
trap - EXIT
while true; do wait $! || continue; done
exit 0

Once the image is built and pushed to a registry, it can be deployed to a Kubernetes cluster.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: falco
spec:
  serviceName: falco
  replicas: 1
  selector:
    matchLabels:
      app: falco
  template:
    metadata:
      labels:
        app: falco
    spec:
      containers:
        - name: falco
          image: <image>
          securityContext:
            privileged: true

Once the container is running, you can see that the falco module is present and loaded:

[I] ~/P/b/k/falco> kubectl exec falco-0 -- lsmod | grep falco
falco                 647168  0

Be aware that in order to load a kernel module from a container, you need to make sure that kernel.lockdown is set to none in the Bottlerocket API.

For additional information about building out of tree kernel modules, see Building out-of-tree kernel modules.

Solutions from our AWS Partners

AWS Partner support continues to be a crucial role in enabling customers to leverage Bottlerocket for their workloads. Our focus has been on working backwards from customers to identify solutions that meet common requirements. This includes solutions that span monitoring and logging, management and DevOps, and security. We want to ensure that customers can have a consistent experience managing their Bottlerocket nodes, leveraging the same tools they’re already using today.

Over the course of the last year we have continued to expand AWS Partner support for Bottlerocket. This includes support for the following solutions:

Crowdstrike certified their Falcon platform. Falcon enables customers to leverage endpoint management solution with Bottlerocket-based hosts.
PaloAlto certified their Prisma Cloud solution. Prisma Cloud enables customers to monitor and protect container workloads running on Bottlerocket-based hosts, as well as monitor and firewall Bottlerocket hosts.
Codefresh certified their Codefresh Runners solution. Codefresh Runners give customers the ability to quickly and securely scale their build environments using Bottlerocket nodes.
Granulate certified their Granulate Agent solution. Granulate helps customers optimize containerized environments running on Bottlerocket OS in order to increase performance while reducing costs.
JFrog certified their JFrog Platform to run on Bottlerocket-based hosts.
NetApp certified their Spot Ocean solution. With Spot Ocean, customers can launch, manage, and run the Spot controller on top of instances running Bottlerocket OS, as well as leverage Spot Ocean’s cost optimization capabilities.

What’s next?

Work continues on the project, with new features and integrations planned in the coming months. You can follow along at the project repo, and discuss issues and new features at the AWS Containers Roadmap, as well. Just released is native support for Bottlerocket in Amazon EKS Managed Node Groups, and GPU support and availability in AWS GovCloud (US) Regions are both coming soon.

To jump right in, you can check out the quickstart guides using published AMIs for Amazon ECS or Amazon EKS. If you’re in a VMware environment, check out the guide using the published OVAs for Kubernetes. If you’d like to check out some of the build customization features, you can also build your own AMI.

Hopefully this quick tour has shared enough context and use cases that you can you see how Bottlerocket can help you worry less about the operating system running on your cluster nodes. If you have ideas, suggestions, or issues, visit the project and its roadmap or the containers roadmap repo, and reach out anytime.

Containers