Containers
Enabling mTLS in AWS App Mesh using SPIFFE/SPIRE in a multi-account Amazon EKS environment
NOTICE: October 04, 2024 – This post no longer reflects the best guidance for configuring a service mesh with Amazon ECS and Amazon EKS, and its examples no longer work as shown. For workloads running on Amazon ECS, please refer to newer content on Amazon ECS Service Connect, and for workloads running on Amazon EKS, please refer to Amazon VPC Lattice.
——–
Over the past few years, companies and organizations have been adopting microservice-based architectures to drive their businesses forward with a rapid pace of innovation. Moving to microservices brings several benefits in terms of modularity and deployment speed, but it also adds additional complexity that requires establishing higher security postures. For distributed applications spanning multiple, potentially untrusted networks, it is necessary to implement a zero-trust security policy that considers any source as potentially malicious.
Service mesh solutions like AWS App Mesh can help you manage microservice-based environments by facilitating application-level traffic management, providing consistent observability tooling, and enabling enhanced security configurations. Within the context of the shared responsibility model, you can use AWS App Mesh to fine-tune your security posture based on your specific needs. For example, you may have security and compliance baselines that require you to encrypt all inter-service communications. In this case, AWS App Mesh can help by encrypting all requests between services using Transport Layer Security (TLS) and Mutual TLS authentication (mTLS). Mutual TLS adds an additional layer of security over standard TLS, using asymmetric encryption to verify the identity of both the server and the client. It also ensures that data hasn’t been viewed or modified in transit.
AWS App Mesh uses a popular open-source service proxy called Envoy to provide fully managed, highly available service-to-service communication. Envoy’s Secret Discovery Service (SDS) allows you to bring-your-own sidecars that can send certificates to Envoy proxies for mTLS authentication. SPIFFE, the Secure Production Identity Framework for Everyone, is a set of open-source standards that software systems can adopt to mutually authenticate in complex environments. SPIRE, the SPIFFE runtime environment, is an open-source toolchain that implements the SPIFFE specification. SPIRE agents use Envoy’s SDS to provide Envoy proxies with the necessary key material for mTLS authentication. The following diagram provides a high-level overview of how mTLS authentication takes place using SPIRE:
- SPIRE agent nodes and workloads running on these agent nodes are registered to a SPIRE server using the registration API.
- The SPIRE agent has native support for the Envoy Secret Discovery Service (SDS). SDS is served over the same Unix domain socket as the workload API and Envoy processes connecting to SDS are attested as workloads.
- Envoy uses SDS to retrieve and maintain updated “secrets” from SDS providers. In the context of TLS authentication, these secrets are the TLS certificates, private keys, and trusted CA certificates.
- The SPIRE agent can be configured as an SDS provider for Envoy, allowing it to provide Envoy with the key material it needs to provide TLS authentication directly.
- The SPIRE agent will also take care of regenerating the short-lived keys and certificates as required.
- When Envoy connects to the SDS server exposed by the SPIRE agent, the agent attests Envoy and determines which service identities and CA certificates it should make available to Envoy over SDS.
- As service identities and CA certificates rotate, updates are streamed back to Envoy. Envoy can immediately apply them to new connections without interruption, downtime, or ever having private keys touch the disk.
Overview
Previous blog posts have demonstrated how to use mTLS in a single Amazon Elastic Kubernetes Service (Amazon EKS) cluster and how to leverage AWS App Mesh in a multi-account environment. The purpose of this blog post, however, is to combine these two ideas and demonstrate how to secure communications between microservices running across different Amazon EKS clusters in different AWS accounts. We’ll be using AWS App Mesh, AWS Transit Gateway, and SPIRE integration for mTLS authentication. The following diagram illustrates the multi-account environment that we will build in the following tutorial:
- We will use three Amazon EKS clusters, each in its own account and VPC. The network connectivity between these three clusters will be made through AWS Transit Gateway.
- A single SPIRE server will be installed into the EKS cluster named
eks-cluster-shared
and SPIRE agents will be installed into the other two EKS clusters namedeks-cluster-frontend
andeks-cluster-backend
. - We will use the AWS Resource Access Manager (AWS RAM) to share the mesh across all three accounts so it will be visible to all EKS clusters.
- We will create AWS App Mesh components and deploy them using a sample application called Yelb that allows users to vote for their favorite restaurant.
- Yelb components include:
- The
yelb-ui
component, which is responsible for serving web artifacts to the browser, will be deployed ineks-cluster-frontend
residing in our frontend account. - The
yelb-appserver
,redis
, andpostgres
database will be deployed ineks-cluster-backend
residing in our backend account.
- The
Walkthrough
Prerequisites:
This tutorial assumes that you are using a bash shell. Accordingly, you will need to ensure that the following tools are installed:
- AWS CLI
- eksctl utility used for creating and managing Kubernetes clusters on Amazon EKS
- kubectl utility used for communicating with the Kubernetes cluster API server
- jq JSON processor
- Helm CLI used for installing Helm Charts
Configure the AWS CLI:
Three named profiles are used with the AWS CLI throughout this tutorial to target command executions at different accounts. After you have identified the three accounts you will use, ensure that you configure the following named profiles with AdministratorAccess:
shared
profile – the main account that will host the SPIRE serverfrontend
profile – the account that will host the frontend resourcesbackend
profile – the account that will host the backend resources
Since these profiles are referenced in various commands and helper scripts throughout this tutorial, ensure they are named exactly as specified, otherwise certain commands will fail.
Your AWS CLI credentials and configurations should look like the following example snippet:
Alternatively, you can configure the AWS CLI to use IAM roles as well.
Clone the GitHub repository:
Deploying the AWS CloudFormation stacks:
Start by deploying the shared services AWS CloudFormation stack in the main account using the shared
profile:
This CloudFormation stack will create the following resources:
- a new EKS cluster named
eks-cluster-shared
- a managed node group in a new VPC
- a Transit Gateway named
tgw-shared
- a Transit Gateway attachment associated with managed node group VPC
- an AWS RAM resource share for the transit gateway named
multi-account-tgw-share
- a node instance role that has permission to assume cross-account roles
eks-cluster-frontend-access-role
andeks-cluster-backend-access-role
Next, accept the RAM resource share for the frontend and backend accounts:
Note: This step is not necessary if you are using accounts that belong to the same AWS Organization and you have resource sharing enabled. Principals in your organization get access to shared resources without exchanging invitations.
Next, deploy the frontend CloudFormation stack in the account you have designated to host your frontend resources using the frontend
profile:
This CloudFormation stack will create the following resources:
- a new EKS cluster named
eks-cluster-frontend
- a managed node group in a new VPC
- a Transit Gateway attachment associated with the managed node group VPC
- a role named
eks-cluster-frontend-access-role
with a permissions policy that allows it to be assumed by the node instance role from the main account
Finally, deploy the backend CloudFormation stack in the account you have designated to host your backend resources using the backend
profile:
This CloudFormation stack will create the following resources:
- a new EKS cluster named
eks-cluster-backend
- a managed node group in a new VPC
- a Transit Gateway attachment associated with the managed node group VPC
- a role named
eks-cluster-backend-access-role
with a permissions policy that allows it to be assumed by the node instance role from the main account
Update the kubectl contexts:
Now that the three EKS clusters are created, you will need to update your local ~/.kube/config
file to allow kubectl to communicate with the different API servers. For this, eksctl provides a utility command that allows you to obtain cluster credentials:
By default, this command writes cluster credentials to your local ~/.kube/config
file.
For convenience, make a series of aliases to reference the different cluster contexts:
As with the AWS CLI named profiles, these aliases are also referenced in various commands and helper scripts throughout this tutorial. Ensure that they are named exactly as specified, otherwise certain commands will fail.
Modify the aws-auth ConfigMaps
The aws-auth ConfigMap allows your nodes to join your cluster and is also used to add RBAC access to IAM users and roles. For this tutorial, the SPIRE server hosted in the main EKS cluster eks-cluster-shared
requires authorization to get an authentication token for the frontend eks-cluster-frontend
and backend eks-cluster-backend
EKS clusters in order to verify the identities of the hosted SPIRE agents during node attestation. To accomplish this, the SPIRE server will assume cross-account IAM roles, and these roles should be added to the aws-auth ConfigMap of the frontend and backend EKS clusters.
Execute the following commands to edit the frontend aws-auth ConfigMap:
Execute the following commands to edit the backend aws-auth ConfigMap:
You can verify the updates by executing the following command:
Create the App Mesh service mesh and Cloud Map namespace:
Run the following helper script to:
- install the appmesh-controller in each EKS cluster
- create an AWS App Mesh service mesh (
am-multi-account-mesh
) in the main account - share the service mesh with the frontend and backend accounts
- create an AWS Cloud Map namespace (
am-multi-account.local
) in the backend account
This helper script also creates a yelb
namespace in each EKS cluster and labels it with the following annotations:
- mesh=am-multi-account-mesh
- “appmesh.k8s.aws/sidecarInjectorWebhook”=enabled
These annotations allow the App Mesh sidecar proxy (Envoy) to inject automatically into pods that are created in the yelb
namespace.
The AWS Cloud Map namespace in the backend account is used for service discovery between the yelb-ui
virtual node that will be created in the frontend account, and the yelb-appserver
virtual service that will be created in the backend account.
Deploy the SPIRE server:
Earlier, we modified the aws-auth ConfigMap to allow the SPIRE server in the main EKS cluster (eks-cluster-shared
) to verify the identities of the SPIRE agents during node attestation. We need to create copies of the kubeconfig files for the frontend eks-cluster-frontend
and backend eks-cluster-backend
EKS clusters and make them available to the SPIRE server through ConfigMaps mounted as volumes. Executing the following helper script will expedite this process:
This script creates a spire namespace within the main EKS cluster and launches two new ConfigMaps in that namespace (front-kubeconfig
and back-kubeconfig
) that store copies of the kubeconfig data for the frontend and backend EKS clusters respectively. Since the SPIRE server will be conducting cross-account cluster authentication, the kubeconfig data also specifies the ARN of the corresponding cross-account IAM role (eks-cluster-frontend-access-role
and eks-cluster-backend-access-role
).
Next, install the SPIRE server using a helm chart:
This creates a StatefulSet in the spire namespace of the main EKS cluster and mounts the previously created ConfigMaps (front-kubeconfig
and back-kubeconfig
) as volumes. Note that the trust domain is set to the AWS App Mesh service mesh that was created in the main account and shared earlier (am-multi-account-mesh
). The SPIRE server container image has been rebuilt to include the AWS CLI so that it can execute the eks get-token command for authentication with the frontend and backend EKS clusters. For more information, view the Dockerfile and visit the Amazon ECR Public Gallery listing.
Inspect the resources in the spire namespace to verify that the SPIRE server is up and running:
Verify that the trust domain has been set properly:
Verify that the kubeconfig ConfigMap volumes have been mounted properly:
By inspecting the spire-server ConfigMap, you’ll see that the SPIRE server is configured to use the k8s_psat plugin for node attestation. The agent reads and provides the signed projected service account token (PSAT) to the server:
Before moving on to installing the SPIRE agents, make a copy of the spire-bundle ConfigMap, which contains the certificates necessary for the agents to verify the identity of the server when establishing a connection.
Deploy the SPIRE agents:
Create the spire namespace in the frontend EKS cluster and then create a copy of the spire-bundle ConfigMap:
Next, install the spire agent using the provided Helm chart:
This creates a DaemonSet in the spire namespace of the frontend EKS cluster. It mounts the spire-bundle ConfigMap as a volume to be used for establishing a connection with the SPIRE server. The SPIRE agent is also using the k8s_psat plugin for node attestation. Note that the cluster name (frontend-k8s-cluster
) is arbitrary. However, it must match the cluster name specified in the SPIRE server configuration for the k8s_psat plugin, as this same cluster name will be referenced during workload registration. The SPIRE server address is pulled from the pod (spire-server-0
) running in the main EKS cluster.
To verify that the SPIRE agent is up and running, inspect the resources in the spire namespace:
Repeat the same process for the backend EKS cluster:
Register nodes and workloads with the SPIRE server:
At this point, you are ready to register node and workload entries with the SPIRE server:
Inspect the registered entries by executing the following command:
You’ll notice that there are two entries associated with the SPIRE agent DaemonSets running in the frontend and backend EKS clusters:
The other entries for the frontend and backend workloads (the yelb-ui
, yelb-appserver
, yelb-db
, and redis-server
) reference the SPIFFE ID of the corresponding SPIRE agent as the value of their parent ID. The SPIRE server shares the list of registered entries with the SPIRE agents, who then use it to determine what SPIFFE Verifiable Identity Document (SVID) it needs to issue to a particular workload. This assumes there’s a match with the specified namespace, service account, pod label, and container name combination.
Note: an SVID is not a new type of public key certificate. It defines a standard in which an X.509 certificates are used. For more information, review the SVID specification.
Deploy the mesh resources and the Yelb application:
You can now deploy the AWS App Mesh virtual nodes and virtual services into the backend EKS cluster:
Using the yelb-appserver virtual node as an example, notice that it has a tls
section defined for both its inbound listeners
and its outbound backends
:
The certificate
section under tls
specifies the Envoy Secret Discovery Service (SDS) secret name. In this case, it is the SPIFFE ID that was assigned to the workload. The validation
section includes the SPIFFE ID of the trust domain, which is the AWS App Mesh service mesh created earlier (am-multi-account-mesh
), and a list of SPIFFE IDs associated with trusted services that are used as subject alternative name (SAN) matchers for verifying presented certificates.
Now, deploy the backend Kubernetes resources that the virtual nodes point to:
Before deploying the frontend virtual node and virtual service for the yelb-ui service, run the following helper script. It will retrieve the ARN of the yelb-appserver virtual service from the backend EKS cluster and create an updated version of the yelb-ui virtual node spec file (yelb-ui-final.yaml
) containing that ARN as a reference.
Deploy the AWS App Mesh components and the Kubernetes resources for the yelb-ui frontend:
To test out the yelb-ui service, retrieve the load balancer DNS name and navigate to it in your browser:
You should see the following page load and be able to vote on the different restaurant options, verifying that the yelb-ui service is communicating with the yelb-appserver:
Verify the mTLS authentication:
After executing a few voting transactions via the yelb-ui, you can move on to validating the mTLS authentication that takes place between each of the Envoy proxies for the underlying services. For this, we will query the administrative interface that Envoy exposes.
- Start by switching to the frontend context and setting an environment variable to hold the name of the yelb_ui pod:
- Check that the Secret Discovery Service (SDS) is active and healthy:
You should see one active connection, indicating that the SPIRE agent is correctly configured as an SDS provider for the Envoy proxy, along with a healthy status.
- Next, check the loaded TLS certificate:
This certificate is the X509-SVID issued to the yelb-ui service. You should see two SPIFFE IDs listed, that of the trust domain in the ca_cert
section, and that of the yelb-ui service listed in the cert_chain
section.
Note: App Mesh doesn’t store the certificates or private keys that are used for mutual TLS authentication. Instead, Envoy stores them in memory.
- Check the SSL handshakes:
An SSL handshake is executed between the yelb-ui
and the yelb-appserver
(via the Envoy proxies) for every API request triggered. For example, when the Yelb webpage is loaded, two GET requests (/getvotes
and /getstats
) trigger the execution of two corresponding SSL handshakes.
You can repeat the same process using the backend context to examine mTLS authentication for the other services. For example, you can check the SSL handshakes for the yelb-appserver:
In this case, you’ll notice that additional SSL handshakes are executed with the yelb-db and yelb-redis services.
Cleaning up:
Run the following helper script to delete:
- all resources in the
yelb
andspire
namespaces - the Cloud Map
am-multi-account.local
namespace - the App Mesh
am-multi-account-mesh
service mesh - the
appmesh-controller
- the
appmesh-system
namespace
Delete the CloudFormation stacks, starting with the frontend account:
Delete the backend CloudFormation stack:
Finally, delete the shared services CloudFormation stack:
Conclusion
In this post, we created three Amazon EKS clusters, each in its own AWS account and VPC. We then established a network connection between the VPCs using AWS Transit Gateway. We installed a SPIRE server in one EKS cluster, a SPIRE agent and a frontend service in a second cluster, and another SPIRE agent and backend resources in a third cluster. We used AWS App Mesh to create a service mesh spanning across these three EKS clusters to facilitate service-to-service communication. We then established mutual TLS authentication between the services using Envoy’s Secret Discovery Service (SDS) as implemented by the SPIFFE Runtime Environment (SPIRE).
With this approach, customers that rely on multi-account, multi-cluster strategies can take advantage of the integration between AWS App Mesh and SPIRE to enable mTLS authentication across their segmented environments, moving them a step forward on the path to a zero-trust architecture.
To learn more, we recommend you review these additional resources:
- AWS blog: Leveraging App Mesh with Amazon EKS in a Multi-Account environment
- AWS blog: How to think about Zero Trust architectures on AWS
- AWS docs: Mutual TLS Authentication
- AWS App Mesh examples: How to enable mTLS between two applications in App Mesh using Envoy’s Secret Discovery Service(SDS)
- SPIFFE docs: Spire Concepts
- SPIFFE docs: Spire Quickstart for Kubernetes
- SPIFFE docs: Using Envoy with X.509-SVIDs
- SPIFFE docs: Using Envoy with SPIRE