Automating a Global Meraki Deployment in Multiple AWS Regions
By Josh Leatham, Partner Solutions Architect – AWS
By Simarbir Singh, Technical Marketing Engineer – Cisco
Cisco Meraki customers commonly question how they can extend their SD-WAN environment into an existing Amazon Web Services (AWS) footprint. They desire high availability and a way to automate reachability from their cloud resources down to their local branches.
In this post, we introduce a new AWS Quick Start to help automate the highly available deployment of Meraki vMXs in multiple AWS Regions along with route propagation. Multiple Meraki branch MXs can AutoVPN connect to AWS regional vMX hubs configured in an active/active pair.
All routes learned through AutoVPN are propagated into AWS for seamless connectivity from your branch locations to your AWS workloads in any region. This is accomplished through the newly launched AWS Cloud WAN service, Meraki APIs, and a serverless mechanism for distributing routes without BGP.
Overview and Key Concepts
Cisco Meraki creates intuitive technologies to optimize IT experiences, secure locations, and seamlessly connect people, places, and things. Founded in 2006 and acquired by Cisco in 2012, Meraki has grown to become an IT industry leader, with over 600,000 customers and 9 million network devices online around the world.
The cloud-based platform brings together data-powered products including, wireless, switching, security and SD-WAN, smart cameras, and sensors, open APIs and a broad partner ecosystem, and cloud-first operations.
AWS Cloud WAN
AWS Cloud WAN is a managed wide area networking (WAN) service that makes it easy to build, manage, and monitor a global network that connects resources running across your cloud and on-premises environments. Fundamentally, Cloud WAN provides built-in automated route propagation and segmentation to your AWS WAN network, including:
- Automation: Routes learned in one region can be auto-propagated to any other region with a Core Network Edge (CNE).
- Segmentation: Network segments span multiple regions and are isolated by default unless explicitly shared.
Figure 1 – Meraki + AWS Cloud WAN Quick Start architecture.
To learn more about AWS Cloud WAN, please read the blog post.
Differences Between AWS Cloud WAN and AWS Transit Gateway Architecture
Traditionally, before AWS Cloud WAN was available, customers could utilize AWS Transit Gateway to interconnect workloads across AWS Regions. For example, users can peer a transit gateway in each region and deploy their own custom automation to update the individual transit gateway route tables.
However, Cloud WAN is considered a managed WAN because it provides this route propagation out of the box. To see the original vMX solution built out with Transit Gateway, read this Quick Start.
This Quick Start is written in AWS CloudFormation. CloudFormation helps customers speed up cloud provisioning with infrastructure as code. Templates are written in YAML or JSON and make it possible to scale your infrastructure worldwide and manage resources across all AWS accounts and regions through a single operation.
As shown in Figure 2, the Quick Start is divided into the following two types of AWS CloudFormation templates:
- A base region template to be deployed in the desired region. The region should be a commonly used region in your organization. It spins up the Cloud WAN global resources, an event bus, event rules, as well as the various state machines needed to communicate with Cloud WAN.
- It also spins up the virtual private cloud (VPC) and vMX resources described below in the additional regions template.
- An additional regions template is to be deployed in all regions that will host additional HA vMX pairs. It deploys a VPC, the vMX devices in two Availability Zones (AZs), and the necessary polling lambdas to contact your Meraki dashboard and pull all VPN routes to the various MX branches. Once the routes are discovered, any routes that are not already distributed into the Cloud WAN are sent to the base region’s Amazon EventBridge bus to be processed.
Figure 2 – Quick Start architecture.
Serverless Route Propagation
With AWS Cloud WAN, it is now possible to use BGP to create an underlay where routes are synced end-to-end from local branches to remote AWS Regions. However, where BGP is not an option, it is still possible to sync routes end-to-end using AWS Serverless components. The diagram in Figure 3 shows an example network with two customer branch networks with MX appliances connected to a transit VPC in us-east-1 region with AWS workloads spread across two separate regions.
Figure 3 – Example network.
As shown in Figure 3 above:
- AutoVPN provides routes from branch MXs to vMX. As more branches are added, AutoVPN shares to vMX and other MXs.
- The Quick Start provides automated route propagation into the Cloud WAN SD-WAN segment across regions.
- The Quick Start provides automated return paths from Cloud WAN to the appropriate vMX through its VPC attachment.
- The customer is responsible for programming the vMX with the appropriate CIDR block(s) to summarize the available AWS resources in various regions.
- The vMX will pass along this route to the connected branch MXs via AutoVPN.
- Customer is responsible for transit VPC routes to VPC attachment (single CIDR summary for AWS resources).
The Quick Start solution enables dynamic route propagation for all branch routes that are added or removed from your Meraki SD-WAN network. In Figure 4, these routes are colored green. This includes all the branch routes being provisioned into Cloud WAN, as well as the VPC. The rest of the static routes needed are provided manually, but only needed a single time upon creation.
Figure 4 – Example network route tables.
Serverless Components Used
The following AWS services enable end-to-end route automation and orchestration without the need to deploy any servers:
- Amazon EventBridge is a serverless event bus that makes it easier to build event-driven applications at scale using events generated from your applications, integrated Software-as-a-Service (SaaS) applications, and AWS services. EventBridge is used in this Quick Start to decouple the various serverless components. As routes need to be added or removed, events are written to the EventBridge bus which send to the appropriate state machine.
- AWS Lambda is a serverless, event-driven compute service that lets you run code for virtually any type of application or backend service without provisioning or managing servers. Lambda is used in this Quick Start to run the algorithms needed for each State Machine. There is also a periodic polling mechanism to run an update lambda every one to 10 minutes. This lambda uses Meraki dashboard APIs to check for newly learned routes at each branch and decides whether routing table updates are needed.
- AWS Step Functions is a low-code, visual workflow service that developers use to build distributed applications, automate IT and business processes, and build data and machine learning pipelines using AWS services. Step Functions is used in this Quick Start to orchestrate routing table updates and updating the Cloud WAN Core Networking Policy. As routes propagate, the state machine checks for potential race conditions and ensures the policy is executed and the routes are available throughout the various regions.
Why Step Functions Instead of Lambdas?
With Cloud WAN being a global service, it can take additional minutes to propagate and sync the routes across regions. With Lambda charged by the millisecond, it is more economical to create a state machine that can execute the API and poll periodically for completion.
Another benefit of Step Functions is that it allows for additional logic to check if any other Core Network Policy (CNP) versions are currently being executed and handle race conditions where two separate regions request route updates at the same time.
Route Update Process
Figure 5 walks through the update process that is taken by both the polling lambda and the update state machine. Each process is decoupled and event-driven using EventBridge.
Figure 5 – Route update process.
- Amazon EventBridge rule schedules the polling lambda function to run every 10 minutes (configurable to every one minute as well).
- The polling lambda uses Meraki’s dashboard API to check vMX status as well as any newly learned routes from local branches.
- If a new local branch route is learned from one of the vMX’s AutoVPN spokes, the new route is configured in both the vMX VPC as well as the Cloud WAN network.
- If a vMX is not responsive, the return routes to all local branches connected to that vMX are updated to point to the secondary vMX.
- EventBridge receives the event and forwards it to the update Cloud WAN state machine according to the update route event rule.
- The update state machine function takes the event data from the polling lambda and does the following:
- Ensure that no other CNP is currently being executed.
- Submit the policy.
- Execute the policy.
- Wait until the policy is successfully completed.
In this post, we discussed how Cisco Meraki customers can use an AWS Quick Start to help automate the highly available deployment of Meraki vMXs in multiple AWS Regions along with route propagation.
AWS Cloud WAN was also introduced as an automated service to interconnect multiple regions. Finally, a unique approach to route distribution was introduced using serverless components, such as AWS Step Functions and Amazon EventBridge.
Cisco – AWS Partner Spotlight
Cisco is an AWS Partner providing a range of products for transporting data, voice, and video within buildings, across campuses, and around the world.