Containers
Using mTLS with SPIFFE/SPIRE in AWS App Mesh on Amazon EKS
NOTICE: October 04, 2024 – This post no longer reflects the best guidance for configuring a service mesh with Amazon EKS and its examples no longer work as shown. Please refer to newer content on Amazon VPC Lattice.
——–
By Efe Selcuk and Apurup Chevuru and Michael Hausenblas
You know that here at AWS we consider security as “job zero”, and in the context of the shared responsibility model we provide you with controls to take care of your part. One popular use case of service meshes is to strengthen the security posture of your communication paths, something we’re focusing on in AWS App Mesh. Also, the challenges of using mTLS safely and correctly have been the subject of discussions amongst practitioners. To address your ask from the App Mesh roadmap for mutual TLS (mTLS), we’ve now launched support for this feature. In this blog post we explain the background of mTLS and walk you through an end-to-end example using an Amazon Elastic Kubernetes Service (EKS) cluster.
Background
If you’re not that familiar with mTLS then this section is for you, otherwise you can skip ahead to the walkthrough.
The Secure Production Identity Framework for Everyone (SPIFFE) project, a Cloud Native Computing Foundation (CNCF) open source project with wide community support, provides fine-grained, dynamic workload identity management. Based on the SPIFFE reference implementation called SPIRE you can assign and query a cryptographically strong and proof-able identity in any kind of distributed system. Note that SPIRE is not the only option in this space, you can use for example use Kubernetes secrets as described in Using EKS encryption provider support for defense-in-depthfor encryption, however in the context of this post we will be focusing on SPIRE.
A little bit of SPIFFE/SPIRE terminology to get everyone on the same page:
- A workload is a piece of software deployed with a particular configuration, for example a microservice packaged and delivered as a container.
- The workload is defined in the context of a trust domain, such as a cluster or an entire company network.
- The SPIFFE ID represents the identity of a workload in the form
spiffe://trust-domain/workload-identifier
- An SPIFFE Verifiable Identity Document (SVID) is the document with which a workload proves its identity and is considered valid if it has been signed by an authority within the trust domain. A common example of an SVID instance is an X.509 certificate.
- The SPIFFE workload API provides an platform agnostic way to identify services, akin to what the AWS EC2 Instance Metadata API provides in an AWS specific way.
To learn more check out the video Introduction to SPIFFE and SPIRE Projects by Evan Gilman which, in less than 10 minutes, explains how all these things play together.
mTLS in App Mesh
The general setup in the context of App Mesh looks as follows:
In the data plane App Mesh uses Envoy that acts as a proxy, intercepting any kind of traffic. With mTLS enabled, the communication between the Envoy proxies is authenticated using TLS [1], whereas the communications between a service and its Envoy proxy is plain-text [2].
You can use mTLS authentication for all protocols supported by AWS App Mesh, including L4/TCP, HTTP (1.1/2), and gRPC. We support two mTLS in two modes:
PERMISSIVE
mode for the TLS configuration on the server endpoint, allowing plain-text traffic to connect to the endpoint. This is mainly relevant for migration scenarios and we come back to this in the end of this post.STRICT
mode forces encrypted traffic and should be considered the default, going forward.
App Mesh supports two certificate sources for mutual TLS authentication with a server validation in a listener TLS configuration that can be sourced from either the local file system of the Envoy proxy or via Envoy’s Secret Discovery Service (SDS) API, via SPIRE. Note that App Mesh stores any sensitive data used for mTLS authentication in memory only.
Let’s consider a concrete usage scenario: take the case of an application that handles consumer payments and may have as one of its requirements to be Payment Card Industry Data Security Standard (PCI DSS) compliant. With mTLS, you can now tick that box and leave the heavy lifting to us.
Now that we understand why mTLS is beneficial and how it works on a high level in the context of App Mesh let’s move on to a concrete example.
An mTLS walkthrough
As a preparation, clone the aws-app-mesh-examples.git repo, the following setup is based on the howto-k8s-mtls-sds-based walkthrough. Make sure you have the environment variables AWS_ACCOUNT_ID
and AWS_DEFAULT_REGION
set since this will be needed later on to build and push the container images for the example app to ECR. Further, make sure you have Docker running.
First, create an EKS cluster that is App Mesh-enabled, using the eks-cluster-config.yaml
config file as follows:
Execute the following commend to create the EKS cluster:
Next, install the App Mesh controller using the commands shown in the following below.
First we get the CRDs in place:
Note that if you already have the Helm repo configured that you do an helm repo update
before you apply the CRDs.
Verify the installation:
Now, install the Kubernetes controller for App Mesh itself:
You can, optionally, verify the installed controller version (should be v1.3.x
or above) with:
Next, install the SPIRE server—as a stateful set—and the SPIRE agents—as a daemon set, one per worker node—with the pre-configured trust domain howto-k8s-mtls-sds-based.aws
:
Note that we also maintain Helm charts tailored for single cluster scenarios that you can use to set up your SPIRE installation.
Next, verify the SPIRE setup, that is, make sure that all pods are up and running:
Now we can register agents and workloads, using the helper script register_server_entries.sh:
Note that you can list the registered entities at any time using the following command:
Finally, we deploy and example app to test the connectivity, using the helper script deploy_app.sh:
Verify that all the pods are up and running as well as the custom resources our App Mesh controller looks after; it should look something like this:
And now we can check mTLS:
That’s it! You can also view the status in the App Mesh console where you should see something like this (annotated):
To clean up you can use one or more of the following commands:
Usage considerations
As you’ve seen from above walkthrough, the usage of the new mTLS feature of App Mesh in the context of EKS is straight-forward. This is partly due to the controller we developed and also due to SPIRE taking care of the heavy lifting concerning the workload identities management.
SPIRE issues short-lived certificates, with a default of one hour, and automatically renews them in advance of expiry, also called auto-rotation. The certificates are pushed to the Envoy proxies by the SPIRE agents.
Some further usage considerations for mTLS in the context of App Mesh on EKS:
- You want to plan ahead and consider migrating existing (not encrypted) workloads.
- In above walkthrough we’ve shown a simple scenario with self-signed certificates, however you can and likely want to use a Certificate Authority (CA), for example Amazon Certificate Manager (ACM).
- When using SPIRE in the context of EKS on Fargate, note that you can not use above shown solution as Kubernetes daemonsets are not yet supported in this compute engine.
- For more (related) hands-on walkthroughs check out the App Mesh examples repo.
Let us know your experience with this new App Mesh security feature and share feedback and suggestions via our roadmap.