Building resilient private APIs using Amazon API Gateway

This post written by Giedrius Praspaliauskas, Senior Solutions Architect, Serverless.

Modern architectures meet recovery objectives (recovery time objective, RTO, and recovery point objective, RPO) by being resilient to routine and unexpected infrastructure disruptions. Depending on the recovery objectives and regulatory requirements, developers must choose the disaster recovery strategy. For more on disaster recovery strategies, see this whitepaper.

AWS’ global infrastructure enables customers to support applications with near zero RTO requirements. Customers can run workloads in multiple Regions, in a multi-site active/active manner, and serve traffic from all Regions. To do so, developers often must implement private multi-regional APIs for use by the applications.

This blog describes how to implement this solution using Amazon API Gateway and Amazon Route 53.

Overview

The first step is to build a REST API that uses private API Gateway endpoints with custom domain names as described in this sample. The next step is to deploy APIs in two AWS Regions and configure Route 53, following a disaster recovery strategy.
This architecture uses the following resources in each Region:

Two region architecture example

API Gateway with an AWS Lambda function as integration target.
Amazon Virtual Private Cloud (Amazon VPC) with two private subnets, used to deploy VPC endpoint and Network Load Balancers (NLB).
AWS Transit Gateway to establish connectivity between the two VPCs in different Regions.
VPC endpoint to access API Gateway from a private VPC.
Elastic network interfaces (ENIs) created by the VPC endpoint.
A Network Load Balancer with ENIs in its target group and a TLS listener with an AWS Certificate Manager (ACM) certificate, used as a facade for the API.
ACM issues and manages certificates used by the NLB TLS listener and API Gateway custom domain.
Route 53 with private hosted zones used for DNS resolution.
Amazon CloudWatch with alarms used for Route 53 health checks.

This sample implementation uses a Lambda function as an integration target, though it can target on-premises resources accessible via AWS Direct Connect, and applications running on Amazon EKS. For more information on best practices designing API Gateway private APIs, see this whitepaper.

This post uses AWS Transit Gateway with inter-Region peering to establish connectivity between the two VPCs. Depending on the networking needs and infrastructure already in place, you may tailor the architecture and use a different approach. Read this whitepaper for more information on available VPC-to-VPC connectivity options.

Implementation

Prerequisites

You can use existing infrastructure to deploy private APIs. Otherwise check the sample repository for templates and detailed instructions on how to provision the necessary infrastructure.

This post uses the AWS Serverless Application Model (AWS SAM) to deploy private APIs with custom domain names. Visit the documentation to install it in your environment.

Deploying private APIs into multiple Regions

To deploy private APIs with custom domain names and CloudWatch alarms for health checks into the two AWS Regions:

Download the AWS SAM template file api.yaml from the sample repository. Replace default parameter values in the template with ones that match your environment or provide them during the deployment step.

Navigate to the directory containing the template. Run following commands to deploy the API stack in the us-east-1 Region:

sam build --template-file api.yaml
sam deploy --template-file api.yaml --guided --stack-name private-api-gateway --region us-east-1

Repeat the deployment in the us-west-2 Region. Update the parameters values in the template to match the second Region (or provide them as an input during the deployment step). Run the following commands:
```
sam build --template-file api.yaml
sam deploy --template-file api.yaml --guided --stack-name private-api-gateway --region us-west-2
```

Setting up Route 53

With the API stacks deployed in both Regions, create health checks to use them in Route 53:

Navigate to Route 53 in the AWS Management Console. Use the CloudWatch alarms created in the previous step to create health checks (one per Region):

Configuring the health check for region 1

Configuring the health check for region 2

This Route 53 health check uses the CloudWatch alarms that are based on a static threshold of the NLB healthy host count metric in each Region. In your implementation, you may need more sophisticated API health status tracking. It can use anomaly detection, include metric math, or use a composite state of the multiple alarms. Check this documentation article for more information on CloudWatch alarms. You can also use the approach documented in this blog post as an alternative. It will help you to create a health check solution that observes the state of the private resources and creates custom metrics that are more specific to your use case. For example, such metrics could include increased database transactions’ failure rate, number of timed out requests to a downstream legacy system, status of an external system that your workload depends on, etc.
Create a private Route 53 hosted zone. Associate this with the VPCs where the client applications that access private APIs reside:

Create a private hosted zone
Create Route 53 private zone alias records, pointing to the NLBs in both Regions and VPCs, using the same name for both records (for example, private.internal.example.com):

Create alias records

This post uses Route 53 latency-based routing for private DNS to implement resilient active-active private API architecture. Depending on your use case and disaster recovery strategy, you can change this approach and use geolocation-based routing, failover, or weighted routing. See the documentation for more details on supported routing policies for the records in a private hosted zone.

In this implementation, client applications that connect to the private APIs reside in the VPCs and can access Route 53 private hosted zones. You may also operate an application that runs on-premises and must access the private APIs. Read this blog post for more information on how to create DNS naming that spans the entire network.

Validating the configuration

To validate this implementation, I use a bastion instance in each of the VPCs. I connect to them using SSH or AWS Systems Manager Session Manager (see this documentation article for details).

Run the following command from both bastion instances:
```
dig +short private.internal.com
```
The response should contain IP addresses of the NLB in one VPC:
```
10.2.2.188
10.2.1.211
```
After DNS resolution verification, run the following command in each of the VPCs:
```
curl -s -XGET https://private.internal.example.com/demo
```
The response should include event data as the Lambda function received it.
To simulate an outage, navigate to Route 53 in the AWS Management Console. Select the health check that corresponds to the Region where you received the response, and invert the health status:
After a few minutes, retry the same DNS resolution and API response validation steps. This time, it routes all your requests to the remaining healthy API stack.

Cleaning Up

To avoid incurring further charges, delete all the resources that you have created in Route 53 records, health checks, and private hosted zones.

Run the following commands to delete API stacks in both Regions:

sam delete --stack-name private-api-gateway --region us-west-2
sam delete --stack-name private-api-gateway --region us-east-1

If you used the sample templates to provision the infrastructure, follow the steps listed in the Cleanup section of the sample repository instructions.

Conclusion

This blog post walks through implementing a multi-Regional private API using API Gateway with custom domain names. This approach allows developers to make their internal applications and workloads more resilient, react to disruptions, and meet their disaster recovery scenario objectives.

For additional guidance on such architectures, including multi-Region application architecture, see this solution, blog post, re:Invent presentation.

For more serverless learning resources, visit Serverless Land.

AWS Compute Blog