亚马逊AWS官方博客

Use Cloud Foundations to holistically plan and one-click deploy two network sharing models in multi-account organizations on the cloud丨借助 Cloud Foundations 实现多账户组织云上网络环境两种共享模式的整体规划与一键部署

The Chinese version [1] of this blog post was originally published on February 6, 2023. We updated the network definitions based on the latest specifications when translating and republishing it in English.

Amazon Web Services (AWS) provides plenty of network services and components, allowing you to flexibly define various network topologies to meet the different business needs. Major network services and components include Amazon Virtual Private Cloud (VPC) and its flow logs, subnets, route tables, endpoints, network address translation (NAT) gateways, Internet gateways (IGW), Amazon Transit Gateway (TGW), TGW route tables, attachments, associations, and propagations, AWS Network Firewall (NFW), Gateway Load Balancer (GWLB), Amazon Route 53 hosted zones, Amazon Direct Connect (DX) connection and gateway, etc. On the other hand, whether the organization is based on Amazon Organizations or a self-managed virtual organization, the multi-account organization cloud operating environment has become the de facto choice for more and more users’ cloud journey. The “two manys” situation where there are many network-related services and components and many AWS member accounts to provision has led to a sharp learning curve for building a network environment on the cloud, inefficient deployment process followed by difficult testing and investigation. How to succinctly define the overall network topology of an organization and then efficiently deploy the defined network resources to the relevant accounts is a problem this article attempts to solve.

Thanks to Cloud Foundations solution, you can focus on the network topology planning and design for a multi-account organization’s cloud environment, while entrusting the remaining complicated network resource provisioning and configuration work to Cloud Foundations via IaC and automation.

Sharing network resources

Building a cloud network environment in a multi-account organization can be categorized into two sharing models according to the types of shared network resources:

  • TGW-sharing: create a TGW in the Network Account and share it with the member accounts via Amazon Resource Access Manager(RAM), create VPCs in the member accounts and attach them to the shared TGW, and configure the associations and propagation in TGW route tables in the Network Account based on the VPC attachments. Sharing TGW does not depend on AWS Organizations, so this approach is suitable for both AWS Organizations and virtual organizations;
  • VPC-sharing: Create a TGW, all VPCs and related resources in the Network Account, configure the VPC attachments to the TGW, associate and propagate to TGW route tables, and share the subnets to the corresponding member accounts via RAM. Subnets can only be shared within an AWS Organizations organization, so they are not suitable for virtual organizations. By default, one account can have 5 VPCs. If not enough, you can apply to increase the service quota;

We use the following table to summarize and compare the above two models:

Sharing mode TGW-sharing VPC-sharing
Shared resources TGW subnets
Applicable scenarios AWS Organizations or virtual organizations AWS Organizations only
Topology Decentralized VPCs Centralized VPCs
Features Member accounts have full control over the VPCs Different member accounts can share the same subnet
Operation and maintenance Relatively difficult Relatively easy

There is no absolute advantage between the two sharing models; you can make the best choice according to your actual business. If you don’t use AWS Organizations to manage your organization, you can only use TGW-sharing network connection. Otherwise, using VPC-sharing to centrally manage all VPCs may result in relatively easy and efficient operation and maintenance. In VPC-sharing network connection, if one subnet is shared with multiple accounts, you need to pay attention to the problem of overlapping addresses.

Cloud Foundations supports managing multiple accounts depending on the AWS Organizations organization, as well as self-managed virtual organizations where AWS Organizations is not enabled. Cloud Foundations also supports both of these network sharing models to meet your personalized business scenarios and workload requirements.

Limitations

This solution does not currently include solutions to the following issues:

  • Internet Protocol version 6 (IPv6) based addresses;
  • Hybrid shared connectivity, that is, a hybrid network topology that shares TGW and subnets at the same time. However, you can define and deploy two networks independently where one shares TGW and the other shares subnets;
  • Cloud to on-premises network connectivity, whether through DX connection, virtual private network or other third-party solutions;

Solution Overview

Planning and designing a well-structured multi-account cloud network environment can be overwhelming. It is necessary to estimate short- to medium-term development based on actual business requirements and moderately ahead of schedule. Starting with the allocation of CIDRs for VPCs and subnets, route table routes are designed one by one, all types of gateways are connected, and VPCs are connected or isolated based on the TGW route tables. Furthermore, from the perspective of cost reduction and efficiency increase, it is also necessary to consider the centralized management of interface endpoints and NAT gateways. Once planed and designed, creating and configuring all resources one by one is a lengthy, time-consuming process without automation tools.

We simplify both planning and deployment. In terms of planning, JSON is used to define a succinct framework that clearly specifies all of the TGWs, VPCs, their associated resources and connectivity relationships. In terms of deployment, using Cloud Foundations’ Pipeline Factory, you can one-click deploy all network resources defined in JSON format, share resources, and send the flow logs of the VPC and TGW to the Logs Account. For a network with a size of around 10 VPCs and 100 subnets, deployment time is in minutes, while manual deployment could be in hours.

For the overall architecture diagram and design ideas of Cloud Foundations, please refer to “Fast build a multi-account operating environment on the cloud with best practices and well-architected”.

VPC-sharing network

Let’s first show the example of “VPC-sharing network” and then explain the similarities and differences with “TGW-sharing network”. The following figure shows a VPC-sharing network topology with 5 VPCs, 1 TGW, and 2 TGW route tables. The development subnets are shared to the development account 1 via RAM, production subnets are shared to production accounts 1 and 2, and the log subnets are shared to the Logs Account. Development, production, and log VPCs can access endpoints and hub VPCs, and vice versa. However, development, production, and log VPCs cannot access each other. According to actual business requirements, you can refer to the TGW documentation to plan and design connection and isolation control measures between VPCs, and implement them through the TGW route table.

We built a plethora of Terraform modules for various AWS network resources, such as VPC, TGW, NFW and GWLB. Those modules create and configure network in a simple and standard way, suitable for most common scenarios. We succinctly define each subnet’s segment, i.e. classless inter-domain routing (CIDR), based on Terraform’s cidrsubnet function. With this simplification, it is easy to show the similar relationships of each subnet’s CIDR within different VPCs, making it easy to compare and troubleshoot errors. Try comparing the following two sets of subnet CIDRs:

VPC 10.30.16.0/20 10.30.32.0/20
Subnet 1 Subnet 2
AZ a AZ b AZ a AZ b
Public 10.30.16.0/23 10.30.18.0/23 10.30.32.0/23 10.30.34.0/23
Private 1 10.30.20.0/23 10.30.22.0/23 10.30.36.0/23 10.30.38.0/23
Private 2 10.30.24.0/23 10.30.26.0/23 10.30.40.0/23 10.30.42.0/23
Intra 10.30.28.32/28 10.30.28.48/28 10.30.44.32/28 10.30.44.48/28

Looking at the two sets of CIDRs at first glance, you can see some kind of rule and intrinsic connection. For example, the CIDR increases by the third digit except for the last line, but the connection between the two sets of subnet CIDRs is still quite vague. If the length of the prefix is odd, then this pattern and connection may be even less obvious. Based on the VPC CIDR where the subnet is located, calculate the network CIDR offset and compare:

VPC 10.30.16.0/20 10.30.32.0/20
Subnet 1 Subnet 2
AZ a AZ b AZ a AZ b
Public (3, 0) (3, 1) (3, 0) (3, 1)
Private 1 (3, 2) (3, 3) (3, 2) (3, 3)
Private 2 (3, 4) (3, 5) (3, 4) (3, 5)
Intra (8, 194) (8, 195) (8, 194) (8, 195)

It can be seen that the CIDR offsets of the two sets of subnets are exactly the same. This clear offset correlation facilitates more efficient network planning and makes it easy to detect inconsistent planning of the corresponding subnet.

Define network structure

According to the JSON format, define network connectivity according to the following rules. For array types, it is recommended that you keep the elements position stable in the array to avoid Terraform from destroying and re-provisioning the resources.

The first-level attributes are:

Property Type Note
Vpcs map VPC set definition
resolver map Route 53 resolver definition
tgw map TGW definition
dx map DX gateway definition

 VPC set definition is the mapping from VPC name to properties:

Property Type Default value Note
cidr string VPC CIDR
az_count int 2 AZ count, 2 or 3
is_endpoint bool false Whether to set as the endpoint VPC
is_hub bool false Whether to set as the hub VPC
enable_igw bool false Whether to create and attach an IGW to the VPC
enable_nat bool false Whether to create a NAT gateway for the AZ
nfw.enabled bool false Whether to create and configure NFW
gwlb map {} Gateway load balancer properties
accounts string[] [] Accounts to share, may refer to the core accounts
endpoints string[] [] Interface endpoints, such as ssm , kms , etc
gw_endpoints string[] [] Gateway endpoints, such as s3 or dynamodb
groups map {} Security groups properties
subnets int[][][] N/A Subnets, see below

The subnet CIDRs array contains intra, public and private subnets in order, where the first one is the intra subnet, the second one is the public subnet, and the third one and onwards are private subnets.

  1. Intra subnets: Mainly to connect TGW;
  2. Public subnets: Mainly to place NAT and associate IGW;
  3. Private subnets 1: All private subnets share one route table per zone;
  4. Private subnets 2;
  5. Private subnets n;

The length of the second and third dimension array must be az_count and 2 respectively. For the subnet CIDRs multidimensional array:

  • Set an item as an empty array if you do not want to create certain subnet;
  • Must create the intra subnets to connect the transit gateway;
  • NAT gateway is not created for the intra subnets;
  • The CIDR range of the intra subnets can be small, for example, /28.

There is only one transit gateway per region per definition:

Property Type Default value Note
asn int 64512 Amazon Autonomous System Number
cidr string Common CIDR, should cover all spoke VPC CIDRs
enabled bool false Whether to create and configure TGW
peer bool false Whether to create TGW peering connection to the main region
tables map {} TGW route table set definition

TGW route table set definition is the mapping from route table name to properties:

Property Type Note
routes map(string) Static route mapping, value can be a VPC attachment, a blackhole or TGW peering attachment peer.
associations string[] Association array. VPC names are in the above table. Use peer to associate peering attachment.
propagations string[] Propagation array. VPC names are in the above table.

Endpoint VPC

The endpoint VPC is designed to centrally manage interface endpoints, saving network costs in most cases. An endpoint VPC creates a Route 53 private hosted zone (PHZ) for each interface endpoint in the Network Account, adds records pointing to the interface endpoint, and associates all spoke VPCs. The correct way to properly define an interface endpoint VPC is:

  1. Set is_endpoint to true and name the VPC as you wish;
  2. The VPC must include private subnets and set up an array of interface endpoints;
  3. Other VPCs do not need to set up any interface endpoints, but gateway endpoints (S3, DynamoDB) are still set as needed in each VPC;

Hub VPC

The hub VPC is designed to centrally manage NAT gateways, which can reduce network costs in most cases. Besides centralized access to NAT, this VPC can also be used for inspection on centralized ingress and egress with NFW or GWLB with third-party appliances. This VPC includes centrally managed NAT gateways instead of IGW, so you still need to set enable_igw to true for VPCs that require direct Internet access. The correct way to properly define a hub VPC is:

  1. Set is_hub to true and name the VPC as you wish;
  2. The VPC should include intra subnets, and public subnets for egress to Internet, and the first private subnets for the appliance;
  3. The VPC CIDR should be different from other spoke VPCs, such as 100.64.0.0/16;
  4. Enable select appliances based on functional requirements;
  5. TGW CIDR must cover the CIDRs of all spoke VPCs but this one, however, it cannot be 0.0.0.0/0;

Definition examples

The following JSON content is an example of the definition of the architecture diagram above. Using less than 1,400 bytes (without spaces), it holistically plans and succinctly defines the interconnected network topology with 2 functional VPCs, 3 workload VPCs, 1 TGW, and 2 TGW route tables.

{
  "vpcs": {
    "endpoints": {
      "is_endpoint": true,
      "cidr": "10.0.0.0/20",
      "endpoints": ["ec2", "ssm", "ssmmessages", "ec2messages"],
      "subnets": [
        [[8, 128], [8, 129]],
        [],
        [[4, 2], [4, 3]]
      ]
    },
    "hub": {
      "is_hub": true,
      "cidr": "10.0.16.0/20",
      "enable_igw": true,
      "enable_nat": true,
      "subnets": [
        [[8, 128], [8, 129]],
        [[4, 0],   [4, 1]],
        [[4, 2],   [4, 3]]
      ]
    },
    "logs": {
      "cidr": "10.0.32.0/20",
      "accounts": ["$.account.logs"],
      "gw_endpoints": ["s3"],
      "subnets": [
        [[8, 128], [8, 129]],
        [[4, 0],   [4, 1]],
        [[4, 2],   [4, 3]]
      ]
    },
    "dev": {
      "cidr": "10.0.48.0/20",
      "accounts": ["DEV_ACCOUNT"],
      "gw_endpoints": ["s3"],
      "subnets": [
        [[8, 128], [8, 129]],
        [[4, 0],   [4, 1]],
        [[4, 2],   [4, 3]],
        [[4, 4],   [4, 5]]
      ]
    },
    "prod": {
      "cidr": "10.0.64.0/20",
      "accounts": ["PROD_ACCOUNT_1", "PROD_ACCOUNT_2"],
      "gateway_endpoints": ["s3", "dynamodb"],
      "subnets": [
        [[8, 128], [8, 129]],
        [[4, 0],   [4, 1]],
        [[4, 2],   [4, 3]],
        [[4, 4],   [4, 5]]
      ]
    }
  },
  "tgw": {
    "enabled": true,
    "cidr": "10.0.0.0/16",
    "tables": {
      "pre": {
        "associations": ["logs", "dev", "prod"],
        "propagations": ["endpoints", "hub"]
      },
      "post": {
        "associations": ["endpoints", "hub"],
        "propagations": ["logs", "dev", "prod"]
      }
    }
  }
}

Deploy and destroy

With Cloud Foundations’ powerful capabilities of automated operation and maintenance, you can one-click deploy and share all network resources in the JSON definition described above, and send the flow logs of the VPCs and TGW to the Logs Account. Assume the product-manager role in the Infrastructure Account, the main steps for deploying and sharing network resources are as follows:

  1. Change profile: configure network-vpc profile in the cloud-foundations application;
  2. Launch the shared network connectivity pipeline product:
  3. Product name: cf-network-vpc;
  4. Path: network/vpc;
  5. Account mode: One account;
  6. Account: Network Account;
  7. Regions: leave it blank for the main region;
  8. Stage: leave it blank for the default stage;
  9. Variables: leave it blank for the default stage;
  10. Launch: wait till the product status becomes available;
  11. Release pipeline: prefix-pipeline-network-vpc-apply, this pipeline synchronizes the name tags of networking resources and usually they are all sync’ed after at most two successful runs.

The procedure for deleting resources related to VPC-sharing network is the opposite of the above steps; the destruction pipeline is suffixed with destroy.

TGW-sharing network

Transform the above VPC-sharing network into TGW-sharing network, and share the TGW to member accounts through RAM. Functional VPCs (i.e., endpoint VPCs and hub VPC) are still managed centrally by the Network Account, while workload VPCs are created and attached to TGW in corresponding member accounts. Since communication through the TGW cannot include overlapping CIDRs, production account 2 needs to use a different CIDR to create the VPC (not shown in the image below). One account can create multiple VPCs and attach them to the TGW, which connects all VPCs from the Network Account and member accounts. TGW route tables connect and isolate the network by associations, propagations, and static routes.

Define network structure

The network topology definition uses almost the same specification with the VPC sharing network connectivity product, except for the following:

  1. Dedicate VPCs (endpoint VPC and hub VPC) must be provisioned in the Network Account, i.e., leave their accounts unset;
  2. The accounts property in the VPC array must contain only one account, as overlapping VPC CIDR is not allowed. If this property is unset, then this VPC will be created in the Network Account;
  3. If you want to create different VPCs in a single account, set this account as the only value in the accounts property of the corresponding VPC in the definition;

As for the above definition example, if you want to modify it for TGW sharing network connectivity, the production VPC can only designate one production account. If needed, you can set another VPC with non-overlapping CIDR and designate it to production account 1 or 2.

Deploy and destroy

Similar to VPC-sharing network, Cloud Foundations can help you use TGW-sharing network to one-click deploy all resources in the JSON definition and send flow logs to the Logs Account. Deploying TGW-sharing network takes one more step than VPC-sharing network, because TGW-sharing network is a second-order pipeline product that needs to be generated and then released again before resources can actually be deployed or destroyed. Refer to “Use Cloud Foundations to automatically manage Terraform Infrastructure as Code with Continuous Integration and Continuous Deployment” for details. Assume the product-manager role in the Infrastructure Account, the main steps for deployment are as follows:

  1. Same with the first step in “VPC sharing network connectivity”;
  2. Same with the second step in “VPC sharing network connectivity”;
  3. Launch the TGW sharing network connectivity pipeline product:
  4. Product name: cf-network-tgw-pipeline;
  5. Path: network/tgw/pipeline;
  6. Account mode: One account;
  7. Account: Network Account;
  8. Regions: leave it blank for single-regional deployment, enter all the regions including the main region for multiple-regional deployment;
  9. Stage: leave it blank for the default stage;
  10. Variables: leave it blank for the default stage;
  11. Launch: wait till the product status becomes available;
  12. Release pipelines: a) prefix-pipeline-network-tgw-pipeline-apply; b) prefix-pipeline-network-tgw-apply-fresh;

The destroying process is reverse to the above steps, release all destroy pipelines in reverse order and terminate the product.

Other topics

Major costs

The main difference between VPC-sharing network and TGW-sharing network is the difference between shared resources. The former is a subnet, and the latter is a TGW. There are no additional charges for using RAM. The TGW charges per hour for the number of VPC attachments and traffic. VPC charges for NAT gateways. Additionally, flow logs stored in S3 incur data storage charges. Finally, there are data transfer charges for workloads on instances.

Flow logs

Regardless of the models with which the cloud network environment is built, Cloud Foundations automatically configures flow logs for all VPCs and TGW, records all communication types in a parquet format, and stores them centrally in the Logs Account’s network bucket.

Future Work

It is not easy to simply and efficiently meet the planning and deployment requirements of ever-changing network connectivity with a set of automated components. Currently, the network component’s planning and deployment capabilities can meet most of the basic and typical network building requirements on the cloud. In the future, we will continue to enhance and innovate in tagging adaptation, multi-regional deployment, multi-TGW attachment types, and cloud-to-on-premises connectivity to meet more and higher business scenarios and workload requirements.

Conclusion

This article explains how to use Cloud Foundations’ network components to comprehensively plan and succinctly define a network structure based on VPC-sharing and TGW-sharing. It also discusses how to one-click deploy defined JSON content, including network resources, shared resources, and flow logs of VPCs and the TGW. The above two aspects help you greatly save time and effort in planning and deploying a multi-account organization’s cloud network environment, such that you can focus on network planning itself that closely relates to business needs and medium- to long-term development, without having to worry about deploying and configuring related AWS services and components. This greatly improves work efficiency and accelerates cloud network construction. You can learn more by visiting the Cloud Foundations Solution page or contact AWS for more information.

References

  1. Blog post: 借助 Cloud Foundations 实现多账户组织云上网络环境两种共享模式的整体规划与一键部署, 2023-02

本篇作者

Clement Yuan

亚马逊云科技专业服务部顾问。曾在亚马逊美国西雅图总部工作多年,就职于 Amazon Relational Database Service (RDS) 关系型数据库服务开发团队。拥有丰富的软件开发及云上运维经验。现负责业务持续性及可扩展性运行、企业应用及数据库上云和迁移、云上灾难恢复管理、云上良好架构框架等架构咨询、方案设计及项目实施工作。

刘育新

亚马逊云科技 ProServe 团队高级顾问,长期从事企业客户入云解决方案的制定和项目的实施工作。