Containers

Application Networking with Amazon VPC Lattice and Amazon EKS

Introduction

AWS customers building cloud-native applications or modernizing applications using microservices architecture can adopt Amazon Elastic Kubernetes Service (Amazon EKS) to accelerate innovation and time to market while lowering their total cost of ownership. Many customers operate multiple Amazon EKS clusters to provide better tenant isolation and to meet organizational requirements. Often, there’s a need for services running in these clusters to communicate with each other. Today, customers accomplish this task by using an assortment of approaches that involve provisioning additional network infrastructure components to their clusters, such as load balancers and service meshes. These approaches all require IP reachability between clusters, add complexity to the deployment architecture, and impose additional operational overhead on the platform and development teams.

Amazon VPC Lattice is a fully managed application networking service built directly into the AWS network infrastructure that you use to connect, secure, and monitor all of your services across multiple accounts and virtual private clouds (VPCs). With Amazon EKS, customers can leverage Amazon VPC Lattice through the use of AWS Gateway API controller, an implementation of the Kubernetes Gateway API. Using VPC Lattice, EKS customers can set up cross-cluster connectivity with standard Kubernetes semantics in a simple and consistent manner.

In this post, we demonstrate a use case where a service in one EKS cluster communicates with a service in another cluster and VPC, using VPC Lattice. We show how service discovery works, with support for using custom domain names for services. We also demonstrate how VPC Lattice enables services in EKS clusters with overlapping CIDRs to communicate with each other without the need for any networking constructs like private NAT Gateways and Transit Gateways. On the security front, we discuss how to secure access to EKS services by enforcing authentication, encryption, and context specific authorization with VPC Lattice Auth policies.

Solution architecture

The solution architecture used to demonstrate cross-cluster connectivity with VPC Lattice is shown in the following diagram. The following are the relevant aspects of this architecture.

  • Two VPCs are setup in the same AWS Region, both using the same RFC 1918 address range 192.168.48.0/20
  • An EKS cluster is provisioned in each VPC. Both clusters in our implementation use Kubernetes 1.24. The clusters may be created using any one of the approaches outlined under Creating an Amazon EKS cluster. Note that VPC Lattice doesn’t impose any specific requirements regarding the Kubernetes version of your EKS cluster. You may choose any one of the four versions (1.22 to 1.25) currently supported by EKS. When using AWS VPC CNI plugin in your cluster, you must use v1.8.0 or later.
  • An HTTP web service is deployed to the EKS cluster in VPC-A, exposing a set of REST API endpoints. Another REST API service is deployed to the EKS cluster in VPC-B and it communicates with an Aurora PostgreSQL database in the same VPC.
  • AWS Gateway API controller is used in both clusters to manage the Kubernetes Gateway API resources such as Gateway and HTTPRoute. These custom resources orchestrate AWS VPC Lattice resources such as Service Network, Service, and Target Groups that enable communication between the Kubernetes services deployed to the clusters. Please refer to this post for a detailed discussion on how the AWS Gateway API controller extends custom resources defined by Gateway API, allowing you to create VPC Lattice resources using Kubernetes APIs.

Service to service communication across Amazon EKS clusters using Amazon VPC Lattice

Walkthrough

Implementing cross cluster communication

First, we deploy the Gateway API controller to both EKS clusters, following the steps in this document. Next, we apply a Kubernetes manifest to both clusters that defines the GatewayClass and Gateway resources. The Gateway API controller then creates a Lattice service network with the same name, eks-lattice-network, as that of the Gateway resource if one doesn’t exist and attaches the VPCs to the service network. VPC Lattice allows the deployment of multiple Gateways to a cluster to enable more advanced deployments. However, the cluster VPC can be associated with only one Gateway at any time. The cluster VPC will then get associated with the Lattice service network reference by this Gateway. The other Gateways deployed to the cluster will remain in a detached state. This list of VPCs associated with a Lattice service network can be seen under the VPC associations tab for the service network in the Amazon VPC console, as shown in the figure below.

Figure 2. Console view of a VPC Lattice service network and its VPC associations

Next, we deploy the datastore service to the EKS cluster in VPC-B. This service fronts an Aurora PostgreSQL database and exposes REST API endpoints with path-prefixes /popular and /summary. To demonstrate canary shifting of traffic, we deploy two versions of the datastore service to the cluster using this Deployment manifest. To allow clients from other VPCs to reach this service via Amazon VPC Lattice, we have to register it with a Lattice service network. This is done by deploying to the cluster a HTTPRoute resource that references the eks-lattice-network Gateway and the Kubernetes ClusterIP Service resources that select the version-specific pods of the datastore service.

Implementing custom domain names

This setup demonstrates a scenario where incoming traffic with the path-prefix /popular is all directed to v1 of the datastore service, while traffic with path-prefix /summary is split between v1 and v2 of the service, based on the weights defined in HTTPRoute. The Gateway API controller creates a Lattice service that corresponds to the HTTPRoute resource, which can be verified in the VPC Console as shown in the following figure. Note that we have configured this Lattice service to be exposed to clients using a custom domain name datastore.lattice-test.io specified in the HTTPRoute resource. You can associate a custom domain name with an HTTPRoute resource only at the time of creating it. You cannot modify an existing HTTPRoute to include a custom domain.

The custom domain name is mapped to the Lattice generated domain name by means of a DNS CNAME record in a Route 53 private-hosted zone associated with VPC-B. Refer to the documentation in the VPC Lattice user guide for more details about how this mapping is done.

Lattice creates an HTTP listener and target groups to manage traffic routing to the datastore service. These artifacts can be verified under the Routing tab of the Lattice service in the VPC console, as shown in the following figure.

Figure 4. HTTP listener and target group settings for a VPC Lattice service

At this point, we have completed all the steps to deploy the datastore service to the EKS cluster in VPC-B and expose it using a custom domain name to clients in all other VPCs associated with the eks-lattice-network Lattice service network. Next, we have to setup a virtual firewall rule that allows the datastore service to receive traffic from the VPC Lattice fleet. This is done by creating a security group with an inbound rule that allows traffic on all ports from 169.254.171.0/24 which is the VPC Lattice link local address range. This security group is then attached to the worker nodes of the EKS cluster in VPC-B. A recommended best practice is to add this rule to the cluster security group, which is created by EKS when you create a cluster. The CLI command to execute this task is found in step. 3 of this document. The following is the sequence of CLI commands to deploy the Lattice and Kubernetes artifacts discussed thus far:

kubectl apply -f gateway-lattice.yaml          # GatewayClass and Gateway
kubectl apply -f route-datastore-canary.yaml   # HTTPRoute and ClusterIP Services
kubectl apply -f datastore.yaml                # Deployment

Now, we deploy the frontend service to the EKS cluster in VPC-A using this manifest. The service is configured to communicate with the datastore service in VPC-B using its custom domain name. To enable this, VPC-A must be associated with the Route 53 private-hosted zone, which contains the DNS CNAME record for the custom domain name datastore.lattice-test.io, as shown in the following figure. The cluster security group attached to the EKS worker node allows, by default, all outgoing traffic. Hence, no additional networking setup is needed at the client side in VPC-A. Notice that the frontend and datastore services are deployed to separate clusters that have an overlapping CIDR 192.168.48.0/20. Yet, they communicate with each other via Lattice without the need for any network infrastructure components such as a NAT gateway or transit gateway.

Listed below is the sequence of CLI commands to deploy the Lattice and Kubernetes artifacts to the EKS cluster in VPC-A. Included in this list is also the command that uses kubectl port-forward to forward outgoing traffic from a local port to the server port 3000 on one of the pods of the frontend service, which allows us to test this use case end-to-end without needing any load balancer. The figure below shows the responses received by invoking the API end points on the frontend service, which in turn invokes the datastore service via Lattice. The results show the responses for the /summary endpoint shifting between the two versions of the datastore service.

kubectl apply -f gateway-lattice.yaml                          # GatewayClass and Gateway
kubectl apply -f frontend.yaml                                 # Deployment and ClusterIP Service
kubectl -n apps port-forward frontend-6545dc979c-hqgvj 80:3000 # Port Forwarding

Figure 6. JSON responses received by an EKS client from a VPC Lattice service in a different EKS cluster

Implementing encryption in transit

To ensure data protection in transit, customers can use SSL/TLS to communicate with Lattice services. You can create an HTTPS listener, which uses TLS version 1.2 to terminate HTTPS connections with VPC Lattice directly. You can also bring your own TLS certificate that is associated with custom domain names. VPC Lattice supports TLS on HTTP/1.1 and HTTP/2.

To implement SSL/TLS for Lattice services using the Gateway API, you must first configure one or more HTTPS listeners in the Gateway resource, as shown in this manifest. In this example, the HTTPS listener named tls-with-default-domain is associated with the VPC Lattice generated Fully Qualified Domain Name (FQDN). VPC Lattice provisions and manages a TLS certificate for this FQDN and therefore this listener doesn’t require any additional configuration. The HTTPS listener named tls-with-custom-domain is associated with the custom domain name datastore.lattice-test.io used in this post. This listener is configured to secure communications with a private TLS certificate provided by the user and sourced from AWS Certificate Manager (ACM). To create an SSL/TLS certificate, please refer to the ACM user guide. Customers can bring their own TLS certificates and import them into ACM.

The configured HTTPS listener is then referenced from an HTTPRoute resource to secure the corresponding Lattice service with SSL/TLS. An example of exposing the datastore service with a VPC Lattice generated FQDN and securing it using the TLS certificate provisioned by VPC Lattice, is shown in this manifest, which references the listener named tls-with-default-domain. An HTTPRoute that exposes the datastore service with the custom domain name datastore.lattice-test.io and secures it with the user-provided TLS certificate, is shown in this manifest, which references listener named tls-with-custom-domain. The following figure shows the console view of a service configured with the custom domain name and secured using a user-provided TLS certificate in ACM.

Implementing access control

The design of Amazon VPC Lattice is secure by default. If you want to enable access to an EKS service via Lattice, then you must first associate the cluster VPC with a Lattice service network. Then, you must create a Lattice service that can route traffic to one or more EKS services using the mappings defined in an HTTPRoute. Access to the Lattice service may be further restricted using network-level primitives such as network ACLs and security groups. In addition to these layers, you can apply an AWS Identity and Access Management (AWS IAM) resource-based policy to a Lattice service network and/or service to exercise more fine-grained control over who can invoke your services. Note that Kubernetes RBAC doesn’t play any role whatsoever in implementing access control to an EKS service shared via Lattice. It’s all done entirely in the IAM realm. Let’s discuss these implementation details next.

The first step is to enable IAM authentication for the Lattice service network and apply an IAM resource-based policy using the CLI commands shown below. This can be done from the VPC console as well. The policy file lattice-service-network-policy.json used in the command defines a very permissive policy that grants everyone access to the service network. This access control model requires every client that’s invoking a Lattice service in the service network to present IAM authentication credentials.

aws vpc-lattice update-service-network --service-network-identifier $SERVICE_NETWORK_ID --auth-type AWS_IAM
aws vpc-lattice put-auth-policy --resource-identifier $SERVICE_NETWORK_ID --policy file://lattice-service-network-policy.json

Next, we enable IAM authentication for the Lattice service and apply an IAM resource-based policy using the CLI commands shown below. The policy file lattice-service-policy.json used in the command defines a fine-grained policy that grants permission to a client to invoke the Lattice service only if it meets all of the following criteria:

  1. client must present credentials of a specific IAM role named aws-sigv4-client.
  2. client can only invoke a specific service (the resource ID in the policy is that of the Lattice service mapped to the datastore EKS service in VPC-B)
  3. client must originate from a specific VPC (the source VPC ID in the policy pertains to VPC-A)
aws vpc-lattice update-service --service-identifier $SERVICE_ID --auth-type AWS_IAM
aws vpc-lattice put-auth-policy --resource-identifier $SERVICE_ID --policy file://lattice-service-policy.json

That completes the setup on the service side. On the client side, a caller must satisfy two requirements. First, the caller must present valid credentials for an IAM user or role. VPC Lattice uses AWS Signature Version 4 (SigV4) for client authentication. Therefore, the caller must sign the requests with SigV4 using IAM credentials. Second, the IAM user or role used by the caller must be authorized to invoke the Lattice service. The authorization requirement can be met by attaching an appropriate IAM identity-based policy to the IAM user or role.

To meet the authentication requirement, we need a mechanism to sign every outgoing request from the client using SigV4. The signing of requests can be implemented directly in your application code using the AWS SDK. Please refer to the documentation for examples of signature calculation in AWS Signature Version 4. In Kubernetes, it is very common to implement such cross-cutting concerns using the sidecar pattern without changing application code. VPC Lattice welcomes your feedback as we consistently look to improve this process

In this post, we are using AWS SigV4 Proxy to calculate the signature that provides authentication information in the client request. To use this proxy from a client such as the frontend service deployed to the EKS cluster in VPC-A, we use the sidecar pattern. By configuring the proxy to use IAM roles for service accounts (IRSA), we can make the proxy use the credentials of an IAM role to sign the requests. By attaching an identity-based policy defined in the file lattice-client-policy.json to the IAM role, we can grant permissions to the role to call the Lattice service mapped to the datastore EKS service in VPC-B. Lastly, we must use the same IAM role aws-sigv4-client that was used in the resource-based policy applied to the Lattice service. The helper script createIRSA.sh creates all the IAM artifacts needed to configure IRSA. Note that the JSON web token (JWT) generated by the OIDC provider of EKS cluster, which contains the service account identity isn’t used anywhere in this authentication scheme. We are merely using IRSA as a convenient mechanism to expose IAM role credentials to the SigV4 proxy.

Figure 8. Enabling access control for VPC Lattice service using IAM resource-based policies

To automate the injection of a SigV4 proxy container into every client pod that needs to invoke a Lattice service, we make use of the AWS SIGv4 Proxy Admission Controller, as shown in the figure above. This admission controller is deployed to an EKS cluster using the Helm command shown below. Additionally, before deploying a Lattice client such as the frontend service to the EKS cluster, we must add a set of annotations to help identify its pods as targets for sidecar injection, specify the FQDN of the Lattice service and whether to use HTTP/HTTPS to communicate with it, as seen in this manifest.

helm install aws-sigv4-proxy-admission-controller eks/aws-sigv4-proxy-admission-controller \
--namespace kube-system \
--set image.repository=public.ecr.aws/aws-observability/aws-sigv4-proxy-admission-controller \
--set image.tag=1.2 \
--set image.pullPolicy=Always \
--set env.awsSigV4ProxyImage=public.ecr.aws/aws-observability/aws-sigv4-proxy:1.7

In summary, here are the steps to implement access control for a Lattice service.

On the service side:

  • enable IAM authentication at the service network level and apply a coarse-grained IAM resource-based policy. This step is optional, but is highly recommended.
  • enable IAM authentication at the service level and apply a fine-grained IAM resource-based policy

On the client side:

  • setup an IAM role for a Kubernetes service account and, using an IAM identity-based policy, grant appropriate permissions to the role to invoke Lattice services. Ensure that this is the same IAM role used in the resource-based policy applied to the Lattice service.
  • deploy the AWS SIGv4 Proxy admission controller to the EKS cluster.
  • deploy the Lattice client to the cluster, configured to run under the identity of the service account created during IRSA setup. Ensure that the deployment manifest of the client contains the annotations required for injecting the AWS SigV4 proxy as a sidecar.

Note that securing a VPC Lattice service with SSL/TLS and enabling access control to the service using IAM are completely separate security mechanisms. Customers have the option to use one without the other or enable both for their services. It’s important for customers to understand the AWS shared responsibility model, assess the level of access control and data protection needed for their services, and leverage these security mechanisms accordingly.

Conclusion

In this post, we showed you a service-to-service communication across multiple EKS clusters using VPC Lattice. We saw how VPC Lattice enables services in EKS clusters with overlapping CIDRs to communicate with each other without the need for any networking constructs like private NAT Gateways and Transit Gateways. We also demonstrated how to expose VPC Lattice services using custom domain names and secure them using customer-provided TLS certificates. The post covered the details of securing access to your EKS services using IAM through the use of VPC Lattice authorization policies and enabling other Amazon EKS client services to authenticate their requests to such secured services.

In this post, we focused primarily on a high-level solution architecture and the workflow. We highly recommend that you study the manifests shared in the sample Git repository to learn more about Kubernetes Gateway API and HTTPRoute resources. The AWS Gateway API controller and documentation are open-source projects licensed under the Apache License 2.0. Give it a try, and let us know about features that interest you. We welcome contributions for new features, additional documentation, or bug fixes.

Viji Sarathy

Viji Sarathy

Viji Sarathy is a Principal Specialist Solutions Architect at AWS. He provides expert guidance to customers on modernizing their applications using AWS Services that leverage serverless and containers technologies. He has been at AWS for about 3 years. He has 20+ years of experience in building large-scale, distributed software systems. His professional journey began as a research engineer in high performance computing, specializing in the area of Computational Fluid Dynamics. From CFD to Cloud Computing, his career has spanned several business verticals, all along with an emphasis on design & development of applications using scalable architectures. He holds a Ph. D in Aerospace Engineering, from The University of Texas, Austin. He is an avid runner, hiker and cyclist.

Sheetal Joshi

Sheetal Joshi

Sheetal Joshi is a Principal Developer Advocate on the Amazon EKS team. Sheetal worked for several software vendors before joining AWS, including HP, McAfee, Cisco, Riverbed, and Moogsoft. For about 20 years, she has specialized in building enterprise-scale, distributed software systems, virtualization technologies, and cloud architectures. At the moment, she is working on making it easier to get started with, adopt, and run Kubernetes clusters in the cloud, on-premises, and at the edge.