Using NAT Gateways with multiple-Amazon VPCs at scale

Introduction

Amazon Virtual Private Cloud (Amazon VPC) use Network Address Translation (NAT) so resources in private subnets can communicate with resources in the internet, other VPCs, on-premises networks, or even the same VPC using the NAT Gateway’s IP address. Amazon VPC NAT Gateway is managed by AWS and addresses this need while providing redundancy, scalability, and resilience. There are factors that influence the resilience and cost of implementing NAT Gateway for your workloads. In this post, we present two architectures that maximize NAT Gateway resilience using multiple Availability Zones (AZs) and help to optimize data transfer charges by keeping the traffic within the same AZ. Besides the cost of the overall solution, we also consider inter-VPC connectivity requirements.

NAT Gateways within an AZ are automatically implemented with redundancy. However, while Amazon VPCs can span multiple AZs, each NAT Gateway operates within a single AZ. If the NAT Gateway fails, then connections with resources using that NAT Gateway also fail. Therefore, we recommend deploying one NAT Gateway in each AZ and routing traffic locally within the same AZ.

NAT Gateway cost factors

Full details on NAT Gateway pricing are available on the Amazon VPC pricing page. Because pricing may change, but blogs can be around for a long time, information on that page should always be considered official. But to summarize, two factors determine what you pay for NAT Gateway: NAT Gateway data processing and a NAT Gateway hourly charge.

As the names suggest, when you send data through a NAT Gateway, you pay for data processing. Likewise, you pay NAT Gateway hourly charges every hour that the NAT Gateway is provisioned and available (with partial hours billed as a full hour). When you have more than one NAT Gateway, you pay the hourly charge for each. But the total cost of traffic processed by NAT Gateways in the same AZ is independent of the number deployed. This is because the total amount of traffic from the VPC is unchanged.

How does this work? Let’s say you have one VPC that spans two AZ, and you have five NAT Gateways per AZ. Within each AZ, 1GB of data per hour is sent to the internet. You pay for 10GB of data processing per hour, plus the hourly charge for ten NAT Gateways.

Now imagine a second VPC that also spans two AZ, with only one NAT Gateway per AZ—each sending 5GB of data to the internet each hour. Like the previous example, you are charged for processing 10GB per hour, but in this case you pay the hourly charge for only two NAT Gateway.

In these examples, you pay for data processing changes since the traffic is going to the internet. And, routing traffic from different AZs to one NAT Gateway for internet egress also incurs inter-AZ data transfer charges, in addition to the hourly cost and NAT Gateway Data processing charges. Having a dedicated NAT Gateway in each AZ lets you route traffic within the same AZ so you do not pay for inter-AZ Data Transfer. Check out the Amazon VPC pricing page for more details.

Optimizing NAT Gateway for resiliency and cost

We recommend using at least one NAT Gateway in each AZ where you run a workload. When considering resiliency, this optimizes your fault tolerance in an event of AZ failure, and also keeps NAT traffic within the same AZ and minimizes inter-AZ Data Transfer costs.

To minimize NAT Gateway data processing fees, we recommend using Gateway Endpoints instead of NAT Gateways when you are connecting to Amazon Simple Storage Service (Amazon S3) or Amazon DynamoDB within the same AWS Region. There is no additional charge for Gateway Endpoints and they provide reliable connectivity to Amazon S3 and DynamoDB without requiring an Internet Gateway (IGW) or a NAT Gateway.

Another approach that might help reduce NAT Gateway data-processing fees is to use AWS PrivateLink Interface Endpoints. PrivateLink Interface Endpoints provide more resiliency when communicating with services within VPCs that you interact with frequently. For a list of all AWS services integrated with AWS PrivateLink, refer to this page in our documentation. When using Interface Endpoints, you pay an hourly usage charge and data processing charges. See Interface endpoint pricing in the AWS PrivateLink overview.

Distributed and centralized architectures for NAT Gateway

Let’s look at two common architectures for implementing NAT Gateway:

1) Distributed NAT architecture: Here you have one NAT Gateway in each AZ within a VPC. Use this when you don’t need to interconnect VPCs, and when you want to keep your network traffic within the boundaries of your VPC.

2) Centralized NAT Architecture with an Egress VPC providing NAT service to multiple VPCs using AWS Transit Gateway: This provides centralized management and security controls for when you have one security team that monitors and decides who had access to internet egress. In addition, you can add in-line third-party firewalls for all internet egress traffic to increase security. The second reason is to optimize costs. Rather than deploying a NAT Gateway in every VPC, using one centralized NAT Gateway may reduce costs.

Distributed NAT: One NAT Gateway in each AZ within a VPC

This approach prioritizes resiliency. Traffic that must travel through a NAT Gateway remains inside the same AZ when going to the internet. This prevents inter-AZ data transfer costs for NAT traffic (e.g., traffic between VPC resources and NAT Gateway). This is shown in the following diagram (figure 1).

Figure 1: One NAT Gateway in each AZ within a VPC

Note that traffic going to the Regional Amazon S3 bucket is sent through Gateway Endpoints instead of NAT Gateway, and the same can be achieved for DynamoDB.

With this architecture, you must configure each subnet to send traffic to the NAT Gateway in their respective AZ. You do that by associating each subnet with a dedicated route table for each AZ. Then, the route table must be configured to use the NAT Gateways in their respective AZ as its route entry.

An AWS CloudFormation stack example with this scenario and with a NAT Gateway per AZ is found in this Github repository.

Things to consider when using this architecture:

The account that owns each VPC/NAT Gateway pays for distributed internet egress traffic.
For Inter-VPC communication, consider using Transit Gateway with more specific routes for the Inter-VPC traffic, or consider VPC peering.

Centralized NAT: Central egress VPC with NAT Gateway

For this architecture, we use AWS Transit Gateway, so we can have multiple spoke VPCs connected to a single egress VPC providing a NAT Gateway. We show this architecture in the following diagram (figure 2).

Figure 2: Centralized egress VPC with NAT Gateway

To centralize the NAT traffic, you need to create a Central Egress VPC in the network services account. Then, your Transit Gateway route configuration sends traffic from spoke VPCs to the Central Egress VPC, as well as the reverse path. Configuration of Route Tables in Transit Gateway is described in the Building a Scalable and Secure Multi-VPC AWS Network Infrastructure whitepaper.

This architecture also provides one NAT Gateway per AZ in the Central Egress VPC, minimizing the total number of NAT Gateways and offering the most resilience. When considering resiliency, we recommend configuring a Transit Gateway attachment for each VPC, to span across all AZs used by your workload. This also makes sure the Transit Gateway keeps all traffic inside the same AZ.

To save on Transit Gateway and NAT Gateway data processing costs, we recommend creating a Gateway Endpoint for each VPC that requires communication to Amazon S3 or DynamoDB resources in the same Region. For example, in the preceding diagram (figure 2), Spoke VPC 1 has its own Gateway Endpoint within the VPC, as this workload uses Amazon S3 connectivity.

Things to consider when using this architecture:

Inter-VPC connectivity is built in by using Transit Gateway.
The account that owns the Central Egress VPC pays the NAT Gateway charges.
With a Central Egress VPC, you can have all traffic exiting to the internet inspected and filtered in one place. The Deployment models for AWS Network Firewall Blog Post details how to use an inspection VPC for egress VPC with inspection for internet.
When you centralize NAT Gateway using Transit Gateway, you pay an extra Transit Gateway data processing charge compared to the distributed architecture where you run a NAT gateway in every VPC. In some cases, such as when you send large amounts of data through the NAT gateway from specific VPCs, keeping the NAT traffic local in the VPC can cost less. You do this by setting up the specific VPCs with dedicated NAT Gateways that keep traffic within the VPC, so it is not processed by the Transit Gateway.
All data transfer charges across AZs over private IP addresses within the same AWS Region through the Transit Gateway are free of charge.
The NAT Gateway resource in this model is centralized. Therefore, each NAT Gateway processes more traffic than the distributed model. You must monitor throughput and connection utilization so you stay within the NAT Gateway limits. You can also use multiple NAT Gateways per AZ, for example, by giving the VPCs that send and receive large amounts of traffic dedicated NAT Gateways.
An IPv4 address from a NAT Gateway can support up to 55,000 simultaneous connections to each unique destination. You are limited to associating two Elastic IP addresses to your public NAT Gateway by default. You can increase this limit by requesting an increase of the Elastic IP addresses per public NAT gateway.

Conclusion

In this post, we explored two architectures that improve NAT Gateway resiliency and scalability using multiple AZs: a distributed architecture with one NAT Gateway in each VPC within an AZ, and a centralized NAT Architecture that uses Transit Gateway to create a central VPC for egress to the internet.

Both designs deploy the NAT Gateway service in multiple AZs, keeping the NAT traffic local and minimizing inter-AZ data transfer cost. Using NAT Gateways in multiple AZs also reduces the failure domain in the event of an AZ outage and lets you process more traffic by using multiple NAT Gateways.

We showed how to calculate the inter-AZ traffic when using a single NAT Gateway in a VPC for multiple subnets distributed across multiple AZs, and considerations when using both architectures. You may find that it is useful to mix both architectures in your environment. For example, you might use the distributed model for large VPCs that operate in similar ways, and the centralized model for small VPCs.

Luis Felipe Silveira da Silva

Luis Felipe is a Network Specialist / Solutions Architect in the ELB Team. He works with a diverse range of load balancing and networking technologies, collaborating with customers and internal teams to design and optimize workloads, along with ensuring successful implementation and adoption of EC2 Networking services.

Francesc Sala

Francesc Sala is a Principal Technical Account Manager in the Strategic Industries team at Amazon Web Services. He has been 8 years in the team helping enterprise customers to optimize cloud services as part of their journey to operational excellence. Prior to AWS, Francesc had 17-years experience in the Telecom and Networking industry.

Networking & Content Delivery