Field Notes: How to Scale Your Networks on Amazon Web Services
As AWS adoption increases throughout an organization, the number of networks and virtual private clouds (VPCs) to support them also increases. Customers can see growth upwards of tens, hundreds, or in the case of the enterprise, thousands of VPCs.
Generally, this increase in VPCs is driven by the need to:
- Simplify routing, connectivity, and isolation boundaries
- Reduce network infrastructure cost
- Reduce management overhead
Overview of solution
This blog post discusses the guidance customers require to achieve their desired outcomes. Guidance is provided through a series of real-world scenarios customers encounter on their journey to building a well-architected network environment on AWS. These challenges range from the need to centralize networking resources, to reduce complexity and cost, to implementing security techniques that help workloads to meet industry and customer specific operational compliance.
The scenarios presented here form the foundation and starting point from which the intended guidance is provided. These scenarios start as simple, but gradually increase in complexity. Each scenario tackles different questions customers ask AWS solutions architects, service teams, professional services, and other AWS professionals, on a daily basis.
Some of these questions are:
- What does centralized DNS look like on AWS, and how should I approach and implement it?
- How do I reduce the cost and complexity associated with Amazon Virtual Private Cloud (Amazon VPC) interface endpoints for AWS services by centralizing that is spread across many AWS accounts?
- What does centralized packet inspection look like on AWS, and how should we approach it?
This blog post will answer these questions, and more.
This blog post assumes that the reader has some understanding of AWS networking basics outlined in the blog post One to Many: Evolving VPC Design. It also assumes that the reader understands industry-wide networking basics.
Simplify routing, connectivity, and isolation boundaries
Simplification in routing starts with selecting the correct layer 3 technology. In the past, customers used a combination of VPC peering, Virtual Gateway configurations, and the Transit VPC Solution to achieve inter–VPC routing, and routing to on-premises resources. These solutions presented challenges in configuration and management complexity, as well as security and scaling.
To solve these challenges, AWS introduced AWS Transit Gateway. Transit Gateway is a regional virtual router that customers can attach their VPCs, site-to-site virtual private networks (VPNs), Transit Gateway Connect, AWS Direct Connect gateways, and cross-region transit gateway peering connections, and configure routing between them. Transit Gateway scales up to 5,000 attachments; so, a customer can start with one VPC attachment, and scale up to thousands of attachments across thousands of accounts. Each VPC, Direct Connect gateway, and peer transit gateway connection receives up to 50 Gbps of bandwidth.
Routing happens at layer 3 through a transit gateway. Transit Gateway come with a default route table to which all default attachment association happens. If route propagation and association is enabled at transit gateway creation time, AWS will create a transit gateway with a default route table to which attachments are automatically associated and their routes automatically propagated. This creates a network where all attachments can route to each other.
Adding VPN or Direct Connect gateway attachments to on-premises networks will allow all attached VPCs and networks to easily route to on-premises networks. Some customers require isolation boundaries between routing domains. This can be achieved with Transit Gateway.
Let’s review a use case where a customer with two spoke VPCs and a shared services VPC (shared-services-vpc-A) would like to:
- Allow all spoke VPCs to access the shared services VPC
- Disallow access between spoke VPCs
To achieve this, the customer needs to:
- Create a transit gateway with the name tgw-A and two route tables with the names spoke-tgw-route-table and shared-services-tgw-route-table.
- When creating the transit gateway, disable automatic association and propagation to the default route table.
- Enable equal-cost multi-path routing (ECMP) and use a unique Border Gateway Protocol (BGP) autonomous system number (ASN).
- Associate all spoke VPCs with the spoke-tgw-route-table.
- Their routes should not be propagated.
- Propagate their routes to the shared-services-tgw-route-table.
- Associate the shared services VPC with the shared-services-tgw-route-table and its routes should be propagated or statically added to the spoke-tgw-route-table.
- Add a default and summarized route with a next hop of the transit gateway to the shared services and spoke VPCs route table.
After successfully deploying this configuration, the customer decides to:
- Allow all VPCs access to on-premises resources through AWS site-to-site VPNs.
- Require an aggregated bandwidth of 10 Gbps across this VPN.
To achieve this, the customer needs to:
- Create four site-to-site VPNs between the transit gateway and the on-premises routers with BGP as the routing protocol.
- AWS site-to-site VPN has two VPN tunnels. Each tunnel has a dedicated bandwidth of 1.25 Gbps.
- Read more on how to configure ECMP for site-to-site VPNs.
- Create a third transit gateway route table with the name WAN-connections-route-table.
- Associate all four VPNs with the WAN-connections-route-table.
- Propagate the routes from the spoke and shared services VPCs to WAN-connections-route-table.
- Propagate VPN attachment routes to the spoke-tgw-route-table and shared-services-tgw-route-table.
Building on this progress, the customer has decided to deploy another transit gateway and shared services VPC in another AWS Region. They would like both shared service VPCs to be connected.
To accomplish these requirements, the customer needs to:
- Create a transit gateway with the name tgw-B in the new region.
- Create a transit gateway peering connection between tgw-A and tgw-B. Ensure peering requests are accepted.
- Statically add a route to the shared-services-tgw-route-table in region A that has the transit-gateway-peering attachment as the next for hop traffic destined to the VPC Classless Inter-Domain Routing (CIDR) range for shared-services-vpc-B. Then, in region B, add a route to the shared-services-tgw-route-table that has the transit-gateway-peering attachment as the next for hop traffic destined to the VPC CIDR range for shared-services-vpc-A.
Reduce network infrastructure cost
It is important to design your network to eliminate unnecessary complexity and management overhead, as well as cost optimization. To achieve this, use centralization. Instead of creating network infrastructure that is needed by every VPC inside each VPC, deploy these resources in a type of shared services VPC and share them throughout your entire network. This results in the creation of this infrastructure only one time, which reduces the cost and management overhead.
Some VPC components that can be centralized are network address translation (NAT) gateways, VPC interface endpoints, and AWS Network Firewall. Third-party firewalls can also be centralized.
Let’s take a look at a few use cases that build on the previous use cases.
The customer has made the decision to allow access to AWS Key Management Service (AWS KMS) and AWS Secrets Manager from their VPCs.
The customer should employ the strategy of centralizing their VPC interface endpoints to reduce the potential proliferation of cost, management overhead, and complexity that can occur when working with this VPC feature.
To centralize these endpoints, the customer should:
- Deploy AWS VPC interface endpoints for AWS KMS and Secrets Manager inside shared-services-vpc-A and shared-services-vpc-B.
- Disable each Private DNS.
- Use the AWS default DNS name for AWS KMS and Secrets Manager to create an Amazon Route 53 private hosted zone (PHZ) for each of these services. These are:
- Authorize each spoke VPC to associate with the PHZ in their respective region. This can be done from the AWS Command Line Interface (AWS CLI) by using the command aws route53 create-vpc-association-authorization –hosted-zone-id <hosted-zone-id> –vpc VPCRegion=<region>,VPCId=<vpc-id> –region <AWS-REGION>.
- Create an A record for each PHZ. In the creation process, for the Route to option, select the VPC Endpoint Alias. Add the respective VPC interface endpoint DNS hostname that is not Availability Zone specific (for example, vpce-0073b71485b9ad255-mu7cd69m.ssm.ap-south-1.vpce.amazonaws.com).
- Associate each spoke VPC with the available PHZs. Use the CLI command aws route53 associate-vpc-with-hosted-zone –hosted-zone-id <hosted-zone-id> –vpc VPCRegion=<region>,VPCId=<vpc-id> –region <AWS-REGION>.
This concludes the configuration for centralized VPC interface endpoints for AWS KMS and Secrets Manager. You can learn more about cross-account PHZ association configuration.
After successfully implementing centralized VPC interface endpoints, the customer has decided to centralize:
- Internet access.
- Packet inspection for East-West and North-South internet traffic using a pair of firewalls that support the Geneve protocol.
To accomplish these centralization requirements, the customer should create:
- A VPC with the name security-egress VPC.
- A GWLB, an autoscaling group with at least two instance of the customer’s firewall which are evenly distributed across multiple private subnets in different Availability Zones.
- A target group for use with the GWLB. Associate the autoscaling group with this target group.
- An AWS endpoint service using the GWLB as the entry point. Then create AWS interface endpoints for this endpoint service inside the same set of private subnets or create a /28 set of subnets for interface endpoints.
- Two AWS NAT gateways spread across two public subnets in multiple Availability Zones.
- A transit gateway attachment request from the security-egress VPC and ensure that:
- Transit gateway appliance mode is enabled for this attachment as it ensures bidirectional traffic forwarding to the same transit gateway attachments.
- Transit gateway–specific subnets are used to host the attachment interfaces.
- In the security-egress VPC, configure the route tables accordingly.
- Private subnet route table.
- Add default route to the NAT gateway.
- Add summarized routes with a next-hop of Transit Gateway for all networks you intend to route to that are connected to the Transit Gateway.
- Public subnet route table.
- Add default route to the internet gateway.
- Add summarized routes with a next-hop of the GWLB endpoints you intend to route to for all private networks.
Transit Gateway configuration
- Create a new transit gateway route table with the name transit-gateway-egress-route-table.
- Propagate all spoke and shared services VPCs routes to it.
- Associate the security-egress VPC with this route table.
- Add a default route to the spoke-tgw-route-table and shared-services-tgw-route-table that points to the security-egress VPC attachment, and remove all VPC attachment routes respectively from both route tables.
In this blog post, we went on a network architecture journey that started with a use case of routing domain isolation. This is a scenario most customers confront when getting started with Transit Gateway. Gradually, we built upon this use case and exponentially increased its complexity by exploring other real-world scenarios that customers confront when designing multiple region networks across multiple AWS accounts.
Regardless of the complexity, these use cases were accompanied by guidance that helps customers achieve a reduction in cost and complexity throughout their entire network on AWS.
When designing your networks, design for scale. Use AWS services that let you achieve scale without the complexity of managing the underlying infrastructure.
Also, simplify your network through the technique of centralizing repeatable resources. If more than one VPC requires access to the same resource, then find ways to centralize access to this resource which reduces the proliferation of these resources. DNS, packet inspection, and VPC interface endpoints are good examples of things that should be centralized.
Thank you for reading. Hopefully you found this blog post useful.