Implementing Serverless Transit Network Orchestrator (STNO) in AWS Control Tower
Many of the customers that we have worked with are using advanced network architectures in AWS for multi-VPC and multi-account architectures. Placing workloads into separate Amazon Virtual Private Clouds (VPCs) has several advantages, chief among them isolating sensitive workloads and allowing teams to innovate without fear of impacting other systems. Many companies are taking workload isolation further by adopting multi-account strategies that provide limited network connectivity between VPCs for dependent services. Today, the most common method to deploy multi-account landing zones is through AWS Control Tower. AWS Control Tower is the easiest way to set up and govern a secure, multi-account AWS environment based on best practices established through AWS’ experience working with thousands of enterprises as they move to the cloud.
In this blog, we demonstrate how to easily provide inter-VPC network connectivity across accounts. We demonstrate how an AWS Control Tower customer that has several accounts for workload isolation, can now connect on-premises systems to workloads running in AWS. This example uses a separate “Shared Services” account, an “Applications” account, as well as a “Network” account. It also uses the AWS Transit Gateway service and the AWS Serverless Transit Network Orchestrator solution (STNO) to manage network connectivity.
This blog is intended for AWS Control Tower administrators, or those responsible for managing networks within their AWS environment. In practice, we expect a network administrator to manage the transit gateway. Once deployed, end users will have a self-service method of connecting their environments within the guardrails that the network administrator has configured through STNO.
The scenario we demonstrate is a common request from AWS customers. In this scenario, a company wants to provide connectivity between an on-premises Microsoft Active Directory server and an AWS Directory Service for Microsoft Active Directory server in a “Shared Services” account. Additionally, Active Directory aware systems in a separate “Applications” account must connect to the AWS Managed Microsoft AD, while restricting traffic to on-premises networks. To accomplish this, we build a hub-and-spoke networking model. We accomplish this using AWS Transit Gateway as the hub in a “Network” account, managed through simple VPC tagging with the help of the Serverless Transit Network Orchestrator Solution.
The default AWS Control Tower implementation deploys “Log Archive” and “Audit” accounts under a core organizational unit (Core OU). Customers commonly create their other accounts under different organizational units based on their governance requirements. A common multi-account structure, aligned to AWS best practices, is used in this scenario and depicted below:
While OUs are not used in the STNO solution, it is an important concept in AWS Control Tower to separate responsibilities and governance using OUs.
Now, let’s discuss what is the Serverless Transit Network Orchestrator solution and how it works. The STNO solution adds automation to the AWS Transit Gateway service by providing the tools necessary to automate the process of setting up and managing transit networks in AWS environments using multiple VPCs. STNO supports both AWS Organizations and standalone AWS account types. A web interface is created to help control, audit, and approve (transit) network changes.
To deploy the STNO solution, the AWS Control Tower admin must deploy two CloudFormation templates: a hub template in the account you want to act as the network hub, and a spoke template in each of spoke accounts. These templates create AWS serverless resources to automate the transit gateway setup and management. You can see the list of the components created by the solution here.
We use AWS CloudFormation StackSets to deploy both CloudFormation templates from your AWS Organizations management account (formerly known as the master account) into the appropriate accounts. CloudFormation StackSets allow you to automatically deploy a template across existing and future accounts in your organization.
Here’s an image of the STNO architecture after it’s fully deployed:
How STNO works
For an in-depth review of the STNO architecture, please refer to the STNO documentation.
To manage the identified VPCs (spoke accounts), you tag the VPCs and the selected subnets within those VPCs. This tag change is sent to the hub account through an Amazon EventBridge bus. When the event is received in the hub account, an AWS Lambda function is triggered that starts the STNO workflow.
AWS Step Functions (STNO state machine) and Lambda process networking change requests from the spoke accounts and event details are stored in a DynamoDB table. You can choose whether to approve requests automatically or manually. If you choose to approve requests automatically, the VPC attaches to the transit gateway and route propagation is configured. If you choose to approve requests manually, Amazon SNS sends an email notification that the request is waiting to be approved within the STNO management web interface. After the request is approved, the STNO state machine applies the network changes. If the request is rejected, the DynamoDB table and the spoke resources tag are updated with the rejected status.
When a request is approved, for bidirectional connectivity, the solution also updates the subnet’s route table in the spoke account with a route to the transit gateway as the target. The solution workflow updates the subnet’s route table with the default route setting defined in the hub template.
To view the current STNO network configuration, as well as tagging event history, users can view the Transit Network Management web interface in read-only mode. Administrators can approve or deny requests through the web interface as well. The Transit Network Management web interface is deployed as part of the STNO solution and hosted in your AWS account (hub account), on a private S3 bucket fronted by Amazon CloudFront. See this for more details.
What we are going to do
In this blog, we demonstrate a hybrid connectivity scenario using three AWS accounts with VPCs plus one on-premises network. The scenario is one where an AWS customer wants to extend their Active Directory forest into AWS by launching an AWS managed Active Directory in their Shared Services account. They only allow traffic to the on-premises network from the Shared Services account. Also, instances in the AWS Applications account are allowed to communicate with each other as well as to the AWS Managed Microsoft AD in the Shared Services account.
The transit gateway is placed in the Network account, which acts as the “hub” of the network architecture. All other network connections to the transit gateway should be considered “spokes,” and the transit gateway route tables determine allowed traffic between the spokes.
We connect the on-premises network to the transit gateway with a Site-to-Site VPN and associate the VPN with the transit gateway On-premises route table. Once that is done, we configure the network by first deploying the STNO solution in all the accounts, and then use tags on VPCs and subnets in the spoke accounts to automate the rest of the transit gateway configuration.
The full series of implementation steps are as follows:
- Deploy the STNO hub CloudFormation template in the Network account.
- Deploy the STNO spoke CloudFormation template in the Application and Shared Services accounts.
- Set manual approval required for “On-premises” transit gateway route table.
- Set up the VPN connection from on-premises network to AWS and associate the VPN with the transit gateway in the network account.
- Tag the VPC and subnets in the Shared Services account to allow network traffic to flow between Shared Services and on-premises, as well as between Shared Services and Application account.
- Tag the VPC and subnets in the Application account to allow network traffic to flow between the Application account and Shared Services account.
- Log into the STNO Management web interface and approve all routing configuration requests.
- Test traffic connectivity is performing as expected.
Define the networking strategy
Before we implement the STNO solution and begin connecting our VPCs, we first must decide how we define allowed network traffic patterns. In AWS Transit Gateway terminology, we do this by creating VPC attachments to a transit gateway. We define transit gateway route tables, associating the attachments to the transit gateway route tables, and allowing route propagations through the gateway so that return traffic is properly routed back to its source. For a detailed tutorial of the STNO solution and route table configuration, please watch this video from re:Invent 2019: AWS Transit Gateway reference architectures for many VPCs (NET406-R1).
As part of the STNO solution default deployment, the transit gateway is set up with the following route tables: Flat, Isolated, On-premises, and Infrastructure. These are placeholder route tables to get you started with no specific routes. The STNO solution modifies these route tables based on the tags you add to your VPCs. You can also manually modify, delete, or add new route tables to meet your specific requirements.
Below is a diagram of what we want to build. For simplicity, we are not showing the Isolated route table that STNO creates by default, as we are not using it in this scenario.
We associate the Shared Services account VPC (that hosts an AWS managed Active Directory service) with the transit gateway Infrastructure route table. We associate the Application account VPC with the transit gateway Flat route table. Finally, we direct transit gateway route propagation from the Flat route table to itself and to the Infrastructure route table, and separately propagate routes between Infrastructure route table and the On-premises route table. The effect of this connectivity is to allow servers in the Application account to communicate to the Shared Services account, but not communicate to the on-premises network. Similarly, the on-premises network is allowed to communicate with the Shared Services account but not the Application account.
We plan ahead by building out a network connectivity matrix of our associations and propagations like so:
This connectivity matrix shows the tagging strategy we use in our VPCs to notify STNO to orchestrate making all the associations and propagations in the network account’s transit gateway. For example, we tag the Application account VPC with the key “Associate-with” and the value “Flat”. In addition, we tag the Application account VPC with the key “Propagate-to” and the values “Flat, Infrastructure”.
We assume that you have already setup your AWS Control Tower, and have created member accounts for Network, Application, and Shared Service (actual name could be different in your case). Read more on how to set up AWS Control Tower here.
We also assume that you have created the member accounts without the AWS Control Tower default VPC. Instead, you have created the spoke VPCs intentionally with non-overlapping CIDR ranges to avoid problems connecting VPCs through the transit gateway.
Before deploying the solution, resource sharing should be enabled for AWS Organizations. This allows the solution to share the transit gateway it creates in the hub account with other accounts in the organization.
1. Deploy the STNO “hub” CloudFormation template
The Serverless Transit Network Orchestrator solution is deployed in the hub and also spoke AWS accounts. You can find the solution’s implementation guide here. We deploy the STNO hub template using CloudFormation StackSets from the AWS Control Tower management account (formerly known as the master account) into the Network account:
a) Log in to your AWS Control Tower management account as a user with permission to create StackSets.
b) Navigate to CloudFormation console > StackSets > Create StackSet. Copy the hub template’s URL from the STNO solution’s page and paste it in the Amazon S3 URL field for the template in StackSet creation page.
c) When creating the CloudFormation StackSet for the hub account, you must specify configuration parameters. One key consideration is allowing the STNO to configure the spoke VPC route tables on your behalf. The parameter that configures this is the Default Route to TGW parameter. This parameter is used by the Solution to update associated VPC’s subnet route tables with route destinations to the transit gateway so that traffic originating in the subnet and destined for other VPCs or on-premises network is directed through the transit gateway. In our deployment we select the RFC-1918 (10/8, 172.16/12, 192.168/16) option so when a subnet is tagged to attach it to a transit gateway the following routes are added to its route table:
Note: The Solution updates the subnet route tables that are attached to the transit gateway with the same default routes to TGW to provide for bidirectional connectivity. However, the STNO Solution won’t overwrite existing default routes with different targets.
Please be aware if, for example, you have a NAT Gateway in your VPC with existing routes to 0.0.0.0/0 destined for the NAT gateway. Deploying the CloudFormation hub template with the Default Route parameter of 0.0.0.0/0 results in no changes to the existing route and bidirectional traffic through the TGW won’t be enabled.
For this blog demonstration, we provide the parameters as displayed below and keep the default values for the rest of parameters. Proceed to the next step.
Note: Because this is the hub template, we only need the template deployed in the Network account once, so we won’t use the automatic deployment feature of StackSet here.
d) In the Configure StackSet options step select the Self-service permissions. For the IAM admin role ARN option, select IAM role name and choose AWSControlTowerStackSetRole. For the IAM execution role name replace the value with AWSControlTowerExecution.
Note: “AWSControlTowerStackSetRole” and “AWSControlTowerExecution” are existing roles that AWS Control Tower uses for launching StackSets, and provide full administrator permissions. If you want to scope down permissions for deploying this StackSet, see the documentation here.
e) In the Set deployment options step, enter the account ID for your Network account in the Account numbers field. Select the Region in which you want the solution to deploy the transit gateway. This is typically the home Region for your AWS Control Tower setup. Proceed to the next step and submit the StackSet creation after reviewing your inputs.
2. Deploy the STNO “spoke” CloudFormation template
Now we must deploy the STNO spoke template in all AWS Control Tower accounts. Rather than deploying the spoke template in each of the spoke accounts individually, we use the automatic deployment feature of CloudFormation StackSets. This way the spoke template will be automatically deployed into each of existing and future accounts under your AWS organization.
a) In the AWS Control Tower management account, navigate to CloudFormation console > StackSets > Create StackSet. Copy the spoke template’s URL from the STNO solution’s page and paste it in the Amazon S3 URL field for the template in StackSet creation page.
b) In the Specify StackSet details step, enter the account ID for your Network account in the Network (Hub) Account parameter.
c) In the Configure StackSet options step, we want to enable automatic deployment of the spoke template across all accounts in our organization. Select Service-managed permissions so CloudFormation StackSets will be able to deploy the template into existing and future accounts once they are added to your AWS organization.
d) In the Set deployment options step, select Deploy to organization as target, and enable Automatic deployments. Select the home Region (same as hub template). Proceed to next and finish StackSet creation after reviewing your inputs.
The on-premises network does not require a STNO CloudFormation spoke template (since it’s not an AWS resource), so we will configure connectivity to on-premises manually in the next step.
3. Set Manual Approvals for “On-Premises” route table
The STNO solution provides the flexibility to invoke a manual approval workflow for route table associations. If manual approval is enabled for a transit gateway route table, the solution will not create associations until the network admin approves the changes via the Transit Network Management web interface. Once approved, the solution automatically manages the transit gateway associations.
To demonstrate this feature, we are going to enable manual approval for associations to the “On-premises” route table. This is a common requirement for customers who require extra guardrails between cloud services and their on-premises network. You can optionally do the same for other route tables if desired.
a) Log in to the Network (hub) account.
b) Navigate to the VPC console > Transit Gateway Route Tables. You should see the four route tables created by the STNO solution.
c) Right click on the On-premises Route Table and click on Add/Edit Tags, change the value for ApprovalRequired tag to Yes.
After you update the tag value, future VPC tag changes related to “On-premises” route table will require approval, both for associating the VPC with the route table as well as adding propagations. The administrator must approve or reject the request via the Transit Network Management web interface. We will show this in step #7.
Note: Removing propagations from the VPC tag does not require approval even if the transit gateway route table has ‘ApprovalRequired’ set to ‘Yes’.
4. Set up and configure the VPN connection to on-premises
In this step, we establish an IPsec tunnel connection between the transit gateway to our on-premises router (Customer Gateway).
a) Log in to the Network account. In the VPC console > Transit Gateway Attachments > Create AWS Transit Gateway Attachment. Here we are creating a new attachment that is VPN type. You can read more about AWS Transit Gateway VPN Attachment here. Choose an existing Customer Gateway or choose New. Input the internet-routable IP address of your customer gateway device and select the Routing option. In this example, we use static routing for demonstration simplicity, although in most scenarios using BGP routing is recommended for flexibility and robustness.
b) You can check the new VPN attachment in the Transit Gateway Attachments. Also, review the created VPN connection under Virtual Private Network (VPN) > Site-to-Site VPN Connections.
c) Download the VPN Configuration file from Site-to-Site VPN Connections console and complete the VPN setup by applying the configurations on your customer gateway device. We don’t go into details of setting up the site-to-site VPN connection in this post. This process is documented here.
d) Confirm that the VPN connection is established and IPsec tunnels are UP.
e) Next, we must associate the VPN attachment to a transit gateway route table. In VPC console > Transit Gateway Route Tables, select the On-premises Route Table and go to Associations tab. Create association and select the VPN attachment we created earlier.
Note: This is the only time we are making changes to the transit gateway manually rather than letting STNO solution automatically manage it for us. This is because the spoke network we are connecting to the transit gateway here is an on-premises network and not a VPC.
f) Next, select the Infrastructure Route Table and under the routes tab create route. For CIDR value, input the CIDR of your on-premises network. In our example, it is 10.199.0.0/16. Select the VPN attachment so that any traffic destined for on-premises IP range will be routed to the VPN attachment.
Note: in this example, we are using static routing and that’s why we are adding a route to the Infrastructure Route Table manually.
5. Tag the VPC and Subnet(s) in Shared Services account
This enables connectivity between the Shared Services VPC that hosts an AWS Managed Microsoft AD and on-premises network where the main Microsoft Active Directory server resides.
a) Log in to the Shared Services account.
b) Navigate to the VPC console > Your VPCs. Select the VPC that’s hosting your managed Active Directory server and add these two Tags:
"Propagate-to" : "Flat, On-premises"
c) In the VPC console > Subnets, select the subnet you want to be attached to the transit gateway and add the following tag;
"Attach-to-tgw" : <leave blank>
d) Log in to the Network account. In the VPC console > Transit Gateway Attachments, review that there is no new Attachment nor any changes in Associations or propagated Routes. This is because we are trying to make changes to the “On-premises” Route Table. This is intended and as we have set the ApprovalRequired to Yes earlier, manual approval is required for any changes to “On-premises” route table. We approve these changes via Transit Network Management web interface later in the process.
6. Tag the VPC and Subnets in Application account
This enables connectivity between the Application VPC hosting application servers and the Shared Services VPC so that application servers can communicate with the AWS Managed Microsoft AD server. At the same time, application servers won’t have connectivity to on-premises network as this is the desired behavior.
a) Log in to the Application account.
b) Navigate to the VPC console > Your VPCs. Select the VPC that’s hosting your application server and add these two tags:
"Propagate-to" : "Flat, Infrastructure"
Note: Tagging the VPC won’t create a transit gateway attachment, nor change anything in the transit gateway route tables. It only updates the DynamoDB table with info that will be used later when a subnet is tagged and actual attachment and route propagations are created.
c) Navigate to the VPC console > Subnets. Select the subnet you want to be attached to the transit gateway and add the following tag:
"Attach-to-tgw" : <leave blank>
Note: Remember that only one subnet from each Availability Zone can be attached to the transit gateway. Otherwise you receive the “DuplicateSubnetsInSameZoneError” error.
This invokes the Lambda function that creates the transit gateway attachment, association, and propagations based on the Tags we added to the VPC in the previous step. This might take a few minutes to complete.
d) Log in to the Network account. In the VPC console > Transit Gateway Attachments, confirm that there is a new attachment. Also review the Associations and propagated Routes for Flat and Infrastructure route tables.
You can also review tagging requests on the Transit Network Management web interface, explained in next step.
7. Approve routing configurations in the STNO Management Web Interface
As mentioned earlier in this post, STNO solution deploys a web interface to help control, audit, and approve changes made to the transit gateway.
a) Log in to your Transit Network Management Console web interface. You should have received the URL and login credentials for this when you deployed the STNO solution hub template.
Review the logs for the changes made by the solution. You can see that changes to the Flat Route Table are auto-approved, and hence no action required on the web interface.
b) We now must approve the changes made to the On-premises route table. To approve these changes, on the Transit Network Management Console web interface, under Actions Items page you should see two items waiting for manual approval. Select each at a time and Approve. This makes the actual changes including transit gateway attachments, associations, and propagations. This takes a few minutes to complete.
You can review the logs on the Dashboard page. Also, you can review the new attachment, association, and propagated routes for the transit gateway in the Amazon VPC console.
8. Confirm connectivity between VPCs and on-premises network
By now, the connectivity is established between the Application VPC and Shared Services VPC, as well as between the Shared Services VPC and on-premises network.
We show that the application server (10.1.0.5) can talk to the Active Directory server in the Shared Services VPC (10.2.0.10):
Note: For this connectivity test we are using ping, so we’ve allowed inbound ICMP from 10.1.0.0/16 in the Shared Services Security Group.
For simplicity, we are simulating an AD server with an Amazon Linux instance. You can see the local IP address in the command prompt between the square brackets.
We show the AWS Managed Microsoft AD server in Shared Services VPC (10.2.0.10) can talk to the on-premises Microsoft Active Directory server (10.199.0.10):
There is no connectivity between the application server and on-premises Microsoft Active Directory.
In this blog, we showed how to use the Serverless Transit Network Orchestrator Solution to configure a hybrid networking environment while maintaining separation between the AWS application environment and the on-premises network. You might be wondering, “How does this automation work for new accounts created in the AWS Control Tower Account Factory?”
Because we deployed the STNO spoke template through CloudFormation StackSets with Service Managed permissions and the auto deployment option, any new accounts added to the organization will have this spoke StackSet deployed after it’s created. You can disable auto deployment if desired, by following the steps in the documentation here.
The default behavior of AWS Control Tower’s Account Factory feature is to create new accounts with a single VPC with a single private subnet with 172.31.0.0/16 as the CIDR range. However, AWS Transit Gateway does not support routing between VPCs with overlapping CIDRs. Our recommendation is to create accounts without a VPC. Creating accounts without a VPC allows you to intentionally create VPCs later with non-overlapping CIDR ranges, and thus avoid problems connecting VPCs through the transit gateway.
If an AWS customer wanted to fully prescribe network connectivity options while still allowing self-service deployment of VPC resources, AWS Control Tower admins should use AWS Service Catalog. Using AWS Service Catalog you are able to allow end users to deploy pre-defined CloudFormation templates fully specifying VPCs, subnets, and tags. IAM permissions are then used to restrict end users from modifying networking and tags on their VPCs.