Networking & Content Delivery

Automating the admission of virtual private clouds to AWS Cloud WAN networks

In this blog post, we present an augmented approach of managing AWS Cloud WAN segments in a secure, scalable, and on-demand way. When your organization increases the number of AWS accounts and AWS Regions in use, operational and security complexities related with admitting new user-created virtual private clouds (Amazon VPCs) to the network also increase—from performing the correct association to the correct segments, to continually ensuring adherence to the global IP addressing plan. This solution enforces the correct network segment membership, including IP address plan adherence, and allows for central infrastructure teams to overwrite the solution behaviour through tagging when needed. We cover the main architectural considerations and requirements to fully automate the VPC admission and attachment management process and produce a deployable example written in AWS CloudFormation.

Note: This blog post isn’t a primer for AWS Cloud WAN and doesn’t explain in detail how the solution works. For that information, see the AWS Cloud WAN documentation, this re:Invent talk, or Introducing AWS Cloud WAN.

Challenge description

Building and operating a global, secure, and scalable network can be challenging. It has been since the early days of enterprise networking, and remains a relatively complex endeavour in the cloud age. AWS Cloud WAN is a fully managed service that creates a global core network in AWS, abstracting network configuration complexity by using policies that describe intent about traffic isolation and segmentation, routing, and network attachment association. Most organizations keep user VPC provisioning a centralized process while other organizations attempt to move it to the edge in the form of service catalog products that can be self-provisioned at the expense of complex, inflexible, and sometimes obscure automations. When considering manageable operation and automation, architecture simplicity, and the need to follow the principle of least privilege, two ubiquitous challenges exist:

  • How to achieve network isolation for the different applications and types of traffic? (for example, production, testing, pci-dss, and so on).
  • How can we allow users to self-service the admission of their VPCs to the core network on demand, in a user-friendly, deterministic, secure, and fully automated manner without involving central infrastructure teams?

The answer to the first challenge is straight-forward: you can use an AWS Cloud WAN segment with the appropriate configuration and routing in place. You can use this solution to achieve the second challenge. You can share the AWS Cloud WAN network with different spoke accounts, and this solution will prevent user accounts from expressing an opinion, in a form of a tag, as to which network segment it should belong (for example, preventing a test account from associating to the production segment). Instead, this solution ensures that the tagging of the attachment can only be performed from the network account, using network control plane events in a fully automated way. Additionally, in cases where the use of Amazon VPC IP Address Manager (IPAM), isn’t being enforced, and the user can specify almost any addressing range to create a VPC, this solution blocks incorrectly addressed VPCs, which mitigates the risk of introducing routing black-holes and duplicated addressing in the global network.

Ideally, administrators should be freed from centralized VPC provisioning and admission to the core network (saving engineering hours and increasing provisioning speed), but there is also a need to ensure provisioning meets the correct level of compliance, that is, ensuring the correct segment and IP addressing are used. Providing a self-service catalog approach to provision spoke VPCs is a tempting idea, but it comes with its own considerations:

  • Service catalog-based user automations typically consist of pre-defined templates, which might not meet your users’ needs and forces inflexibility on your users.
  • Some configuration details about the VPC type still need to be provided to the user (for example, development, testing, and production environments).
  • There is a lack of mechanisms to ensure the correct use of IP addressing, particularly when IPAM isn’t in use

The following solution provides a secure, flexible, and fully automated process that addresses the following items:

  • How to move to an on-demand model, decreasing operational load on infrastructure teams.
  • How to ensure that user-built and -managed VPCs are safe to be admitted to the core network while allowing some freedom, which is difficult to capture in fixed service catalog products.
  • How to ensure that VPCs are admitted to the correct network segments, with no chance of configuration abuse (such as tag manipulation).
  • How to ensure continuous evaluation of admitted VPCs after successful admission.

Solution overview

The proposed solution consists of an event-based architecture, working at the control plane of the core network, through the processing of AWS Network Manager events. This solution provides a scalable, secure and flexible method to provide on-demand VPC admission to the core-network, with the following properties, features and use cases:

  • This solution acts within the boundaries of an AWS network account, processing control plane events and not exposing any of its components to user accounts. Components of this solution aren’t exposed outside of the AWS network account.
  • From the AWS user account perspective, the experience of attaching a VPC to the core network should be like creating the attachment with a previously shared AWS Cloud WAN core network.
  • No AWS account user should be able to use attachment tags to express opinions of network association intent.
  • VPC attachment enrollment should be streamlined and done automatically for any VPC that meets evaluation requirements (that is, attachments are performed against the appropriate segment and IP addressing is coherent).
  • If a VPC attachment doesn’t meet evaluation requirements, even after successful admission to the core network, it should be evicted (for example, if a new Classless Inter-Domain Routing (CIDR) is created that overlaps with the addressing of other VPCs is added to a VPC).
  • Overwrite the behavior of the solution under controlled circumstances (such as changes performed by the central infrastructure team).
  • The ability to extend further logic and functionality to the original solution.

The source code for this solution can be found in the GitHub repository.

Prerequisities

To use this solution, the following prerequisites must be in place:

  1. Modify the AWS Cloud WAN core network policy (the segments and association method).
  2. Create a service control policy (SCP) to prevent user accounts from using the segment association tag.
  3. Create an AWS Identity and Access Management (IAM) role in the AWS management account that allows the querying of AWS account tags from the central network account.
  4. User accounts are tagged with the appropriate route segment or domain tag as part of the account vending process.

Let’s go over each of the prerequisites. First, you need to have a running AWS Cloud WAN core network with a few prerequisites configured, including event monitoring. A policy requires one or more segments. In a typical environment, you might have:

  • Your business specific segments, for example, production, staging, testing, WAN, infrastructure, and so on.
  • A last-hop return segment containing all destination routes (every attachment propagation) aggregating your east-west inspection VPCs (let’s call it the fullreturn segment).

Additionally, you need extra segments:

  • quarantine segment: Into which any untagged segment will be associated. This prevents any kind of communication.
  • quarantineroutes segment: Which will be used to learn any routes being propagated from the quarantine VPC segments.

The following code snippet shows the declaration of these segments as part of the AWS Cloud WAN network policy document.

{
  "version": "2021.12",
  "segments": [
    {
      "isolate-attachments": true,
      "name": "quarantine",
      "require-attachment-acceptance": false
    },
    {
      "isolate-attachments": true,
      "name": "quarantineroutes",
      "require-attachment-acceptance": false
    },
    {
      "isolate-attachments": false,
      "name": "fullreturn",
      "require-attachment-acceptance": false
    },
    ...
  ],
  "segment-actions": [
    {
      "action": "share",
      "mode": "attachment-route",
      "segment": "quarantine",
      "share-with": [ "quarantineroutes" ]
    },
    {
      "action": "share",
      "mode": "attachment-route",
      "segment": "fullreturn",
      "share-with": {
        "except": [ "quarantine", "quarantineroutes" ]
      }
    },
    ...
  ],
  ...
}

The next prerequisite in the AWS Cloud WAN network policy specifies the association method. This will only contain two attachment policies to manage the admission of attachments to the correct segments:

  • Most preferred policy, where the route-domain tag value will be used to select the segment for the attachment to be associated with.
  • Least preferred policy, where in the absence of the route-domain tag, attachments will be associated with the quarantine segment.

The following code snippet shows the declaration of the association method in the AWS Cloud WAN network policy document.

{
  ...
  "attachment-policies": [
    {
      "action": {
        "association-method": "tag",
        "tag-value-of-key": "route-domain"
      },
      "conditions": [
        { "type": "any" }
      ],
      "rule-number": 10
    },
    {
      "action": {
        "association-method": "constant",
        "segment": "quarantine"
      },
      "conditions": [
        { "type": "any" }
      ],
      "rule-number": 20
    }
  ],
  ...
}

The next pre-requisite is to make sure that principals in the AWS user account cannot use the route-domain tag when interacting with the AWS Cloud WAN core network attachment. Users should be prevented from specifying segment membership metadata. The following SCP example can be used to enforce this prerequisite and should be applied within the AWS management and for every organization unit (OU) that isn’t centrally managed by the infrastructure team.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyAttachmentTags1",
      "Effect": "Deny",
      "Action": [
        "networkmanager:TagResource",
        "networkmanager:CreateVpcAttachment"
      ],
      "Resource": "arn:aws:networkmanager:*",
      "Condition": {
        "ForAllValues:StringEquals": { "aws:TagKeys": [ "route-domain" ] }
      }
    }
  ]
}

Note: Depending on the infrastructure as code (IaC) tool you’re using, your users might need to mark the tag configuration of the network attachments to be ignored (for example, if using Terraform, you will need to use a lifecycle statement) to prevent attempts of tag overwrite and API errors because of the SCP.

The next prerequisite is to have an IAM role deployed in the AWS management account that your network account can assume. This role will only allow you to query AWS account tags in the AWS Organizations API for the solution to discover the target segment for VPCs created within the account.

Here’s an example of the trust policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::$NETWORK_ACCOUNT_ID:root"
      },
      "Action": [
        "sts:AssumeRole",
        "sts:TagSession"
      ]
    }
  ]
}

And here are the permissions for the IAM role:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": [
        "organizations:Describe*",
        "organizations:List*"
      ],
      "Effect": "Allow",
      "Resource": "*",
      "Sid": "DescribeOrgAccounts"
    }
  ]
}

Finally, you need to ensure that your user accounts are tagged with the appropriate route-domain tag as part of the account vending process. For example, production accounts would have a route-domain: production tag, matching an existing production segment.

AWS Cloud WAN attachment manager

We are now ready to dive deep into the solution. Figure 1 shows an end-to-end high-level view of the attachment manager solution for AWS Cloud WAN.

High Level view of the attachment manager for AWS Cloud WAN

Figure 1: High Level view of the attachment manager for AWS Cloud WAN

These are the main components:

  • On the left side in Figure 1, you can see a Tenant or User AWS account with a spoke VPC being attached to the previously shared core network and an SCP to be applied across non-centrally managed OUs, preventing the use of the route-domain tag key on AWS Cloud WAN core-network attachments.
  • On the right side of Figure 1, you can see a representation of the central AWS network account with a shared AWS Cloud WAN core network, several facilities to process AWS Network Manager events, and an AWS Lambda function that will perform the admission of the spoke VPC attachment.
  • Also, to the right side of Figure 1, you can see an IAM role deployed into the AWS management account, which allows the Lambda function to read the AWS account tags to determine what kind of account it is (such as test, production, and so on).

With this setup in place, let’s look at what happens when a user attaches a spoke VPC to the core network:

  1. A user creates a spoke VPC and attaches it to the shared core network (monitored by Network Manager as part of the solution).
  2. Because it lacks a route-domain tag, the attachment is admitted to the quarantine segment.
  3. As soon as the attachment is created, it propagates its CIDRs to the quarantineroutes segment, initiating a new route topology change event.
  4. The event invokes the Lambda function, which publishes to an Amazon Simple Notification Service (Amazon SNS) topic with message attributes to allow subscription filtering.
  5. An arbitrary number of Amazon Simple Queue Service (Amazon SQS) queues will subscribe to the SNS topic with the appropriate filtering to receive the relevant events and act as a buffer for the routing control Lambda function.
  6. The Lambda function reads from the queue, gets the Attachment ID and account number, and queries to check the account route-domain tag.
  7. Optionally, the Lambda function verifies if the received VPC propagated addresses are appropriate for the intended segment (as defined by the VPC segment map, explained below), and only then will tag the attachment appropriately
  8. If the new route event relates to a CIDR prefix that isn’t expected (as defined by the VPC segment map), or if the account isn’t tagged, the Lambda function will delete the attachment.
  9. If an overlapping address already exists in the fullreturn or quarantineroute segments, the Lambda function will delete the attachment.
  10. If the Lambda function made it to this point, the routing control Lambda function will tag the attachment with the correct target segment.

Note: AWS Cloud WAN segment assignment can be overwritten by tagging the attachment from the network, bypassing the Organization’s tag lookup.

Optionally, we can also enforce IP address planning coherence for specific Regions and segments. The following code snippet shows a YAML example for the definition of the VPC segment network map (that is, the file vpc_segment_address_map.yml to be moved into the Lambda folder) describing a list of accepted IP address ranges defined per segment and Region. This configuration will be in a file that’s packaged with the Lambda functions that will perform the segment admission controls. To completely disable the checks, you can keep the file empty with {} (empty dictionary syntax in YAML).

infrastructure:
  eu-central-1:
    - "IP ADDRESS SUMMARIES ACCEPTABLE FOR THE SEGMENT AND REGION..."
  ap-southeast-1:
    - "IP ADDRESS SUMMARIES ACCEPTABLE FOR THE SEGMENT AND REGION..."

dev:
  eu-central-1:
    - "IP ADDRESS SUMMARIES ACCEPTABLE FOR THE SEGMENT AND REGION..."
  ap-southeast-1:
    - "IP ADDRESS SUMMARIES ACCEPTABLE FOR THE SEGMENT AND REGION..."

staging:
  eu-central-1:
    - "IP ADDRESS SUMMARIES ACCEPTABLE FOR THE SEGMENT AND REGION..."
  ap-southeast-1:
    - "IP ADDRESS SUMMARIES ACCEPTABLE FOR THE SEGMENT AND REGION..."

prod:
  eu-central-1:
    - "IP ADDRESS SUMMARIES ACCEPTABLE FOR THE SEGMENT AND REGION..."
  ap-southeast-1:
    - "IP ADDRESS SUMMARIES ACCEPTABLE FOR THE SEGMENT AND REGION..."

With the prerequisites in place, we’re ready to deploy the two CloudFormation stacks that implement the solution.

In our example, the deployment of the CloudFormation stacks will use AWS Serverless Application Model (AWS SAM) to do the packaging and pushing of the Lambda functions. One way to install AWS SAM is through brew (MacOS and Linux).

Network Manager event processor setup

To set up the network manager event required for this solution , execute the below commands .

cd src/network-manager-events/cloudformation/

# Build the lambda and deploy the cloudformation stack
sam build && sam deploy \
  --resolve-s3 \
  --capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM CAPABILITY_AUTO_EXPAND \
  --stack-name network-manager-events \
  --region us-west-2 \
  --parameter-overrides \
      ParameterKey=Name,ParameterValue="network-manager-events"

# Let's grab the outputs
NETWORK_MANAGER_STACK_OUTPUT=$(aws cloudformation describe-stacks \
  --stack-name network-manager-events \
  --region us-west-2 \
  --query 'Stacks[0].Outputs')
SNS_TOPIC_ARN=$(echo $NETWORK_MANAGER_STACK_OUTPUT | jq -r '.[] | select(.OutputKey=="SnsNetworkEventsArn") | .OutputValue')

Attachment manager setup

To setup the attachment manager as part of the solution , execute the below commands .

# Variables to add
GLOBAL_NETWORK_ID="<<global network id >> "
ATTACHMENT_MANAGER_NAME="cloudwan-attachment-manager"
AWS_ACCOUNT_READER_ROLE_ARN="<<ARN of AWS Account Reader Role >> "
CORE_NETWORK_ARN="<<ARN of Core network>> "
FULL_RETURN_TABLE="fullreturn"


cd src/network-manager-events/attachment-manager/

aws cloudformation validate-template \
  --template-body file://template.yml \
  --region eu-west-1

# Copy the vpc segment address map to be packaged with the lambda
cp vpc_segment_address_map.yml ../lambda/attachment_manager

# Build the lambda and deploy the cloudformation stack
sam build && sam deploy \
  --resolve-s3 \
  --capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM CAPABILITY_AUTO_EXPAND \
  --stack-name cloudwan-attachment-manager \
  --region eu-west-1 \
  --parameter-overrides \
      ParameterKey=Name,ParameterValue=${ATTACHMENT_MANAGER_NAME} \
      ParameterKey=AwsAccountReaderRoleArn,ParameterValue=${AWS_ACCOUNT_READER_ROLE_ARN} \
      ParameterKey=NetworkEventsSnsTopicArn,ParameterValue=${SNS_TOPIC_ARN} \
      ParameterKey=GlobalNetworkId,ParameterValue=${GLOBAL_NETWORK_ID} \
      ParameterKey=CoreNetworkArn,ParameterValue=${CORE_NETWORK_ARN} \
      ParameterKey=FullReturnTable,ParameterValue=${FULL_RETURN_TABLE}

It’s also recommended to deploy a second instance of the attachment manager in a secondary Region with a 30-second delay for the event processing, as shown in the following code. The code behaviour is idempotent, and the following deployment steps will help mitigate possible service outages in the first Region.

# Build the lambda and deploy the cloudformation stack
sam build && sam deploy \
  --resolve-s3 \
  --capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM CAPABILITY_AUTO_EXPAND \
  --stack-name cloudwan-attachment-manager-secondary \
  --region us-east-1 \
  --parameter-overrides \
      ParameterKey=Name,ParameterValue=${ATTACHMENT_MANAGER_NAME} \  
      ParameterKey=AwsAccountReaderRoleArn,ParameterValue=${AWS_ACCOUNT_READER_ROLE_ARN} \ 
      ParameterKey=NetworkEventsSnsTopicArn,ParameterValue=${SNS_TOPIC_ARN} \ 
      ParameterKey=GlobalNetworkId,ParameterValue=${GLOBAL_NETWORK_ID} \ 
      ParameterKey=CoreNetworkArn,ParameterValue=${CORE_NETWORK_ARN} \ 
      ParameterKey=FullReturnTable,ParameterValue=${FULL_RETURN_TABLE} \ 
      ParameterKey=SqsEventsDelaySeconds,ParameterValue=15

Cleanup

To remove the solution, run the following AWS SAM CLI commands:

# Delete Attachment Manager Stack in the main region
sam delete --no-prompts \
  --stack-name cloudwan-attachment-manager \
  --region eu-west-1

# Delete Attachment Manager Stack in the secondary region
sam delete --no-prompts \
  --stack-name cloudwan-attachment-manager-secondary \
  --region us-east-1

# Delete the Network Manager Events Stack in us-west-2
sam delete --no-prompts \
  --stack-name network-manager-event \
  --region us-west-2

To remove the AWS SAM bootstrap, go to the AWS Management Console for each of the Regions, empty all the relevant buckets from the Amazon Simple Storage Service (Amazon S3) console, and finally delete the CloudFormation stacks with name aws-sam-cli-managed-default.

Conclusion

This solution provides a scalable, secure, and flexible way of performing automatic admission, and continual validation, of VPCs to an existing AWS Cloud WAN network. The solution is based on a serverless event driven architecture that processes AWS Network Manager events related to topological changes to the network. By removing user intervention from the VPC admission process, the security posture of the entire network is improved by preventing configuration abuse, and infrastructure automation can be streamlined both for centrally managed and decentralized on-demand spoke VPC provisioning.

For more information, see AWS Cloud WAN documentation.

Joao Rodrigues

João is a Senior Cloud Infrastructure Architect focused in the design, build and deployment of automated IT infrastructure whose objective is to maximize the stability, performance, scalability and security of modern software. He brings expertise and 15+ years of extensive experience in solving Cloud, Automation, Infrastructure as Code, Scripting, Networking and IT infrastructure problems.

Srivalsan Mannoor Sudhagar ImageSrivalsan Mannoor Sudhagar

Srivalsan is a Cloud Infrastructure Architect at Amazon Web Services ( AWS) , Proessional Services who brings expertise in Cloud Infrastructure and MLOps platforms. He is passionate about networking , container technologies and loves to innovate to help solve customer problems. He enjoys architecting solutions and providing technical guidance to help customers and partners achieve their technical and business objectives.