Bose: Building a scalable, more secure global cloud network with AWS Cloud WAN
In this post, we will show you how Bose designed and built a global cloud network (GCN) to improve operational efficiency and security. We dive into how we used AWS to harmonize our global network. We will explore the hurdles we faced, our guiding principles, and our holistic approach to bridging the gap between cloud and on-site infrastructures. Moreover, we will share the benefits we have realized along the way, which include cost savings, improved performance, as well as enhanced security and operational efficiencies.
About Bose Corporation
Bose, founded by Amar Bose in 1964, is a global leader in audio technology that blends innovation and high-quality sound to produce high-tech audio products that deliver exceptional listening experiences to customers. From state-of-the-art noise-canceling headphones to immersive home theater systems, Bose relentlessly pursues superior sound to bring the joy of music into millions of lives.
Bose operates a complex and diverse network architecture that serves numerous workloads across our business. As the cloud has fueled our agility and innovation, it naturally expanded and blended more and more with our on-premises network. This blending introduced some new challenges we would have to address.
We’re excited to share the lessons we learned, some practical guidance, and hope that you can use this information to accelerate your own digital transformation.
Pre-solution state and challenges
In our prior setup, we used Cisco SDWAN for our global network – connecting our on-premises environment, partner workloads, and some cloud resources. While this approach provided dependability, it had several challenges.
- The ‘backhauling’ traffic model rerouted traffic through our on-premises firewalls, introducing latency, complexity, and cost.
- Cloud traffic relied heavily on VPC peering – managing many VPC peering connections became cumbersome and some traffic even bypassed our security controls.
- Our cloud network segments functioned as isolated entities, causing inconsistent security policies and compliance measures.
- To meet our resiliency objectives, we had to consider the costly option of adding more physical resources.
- Traffic visibility was hampered as it passed through multiple on-premises devices before reaching cloud resources, complicating diagnosis, management, and adding security concerns.
- Our manual IP Address Management added unnecessary delays and complexity to operations when integrating new VPCs.
Network Design principles
When re-architecting our global cloud network, we anchored our journey to five core design principles:
- Minimize impact
We added new VPCs in a phased manner and prioritized a smooth migration path for existing VPCs, aiming to avoid VPC recreation and to limit disruptions.
- Simplify operations
We emphasized automation, using CI/CD pipelines with security checks. By using cloud services, we eliminated error-prone, manual operational tasks like IP address allocation and static route management.
- Reduce costs
To gain stakeholder support, we examined solutions for cost-efficiency, aiming for enhanced functionality at optimized costs. We scrutinized multiple aspects of each part of a proposed solution and tried to identify cost-efficient alternatives. This approach enabled us to make informed decisions that balanced our need to enhance network functionality while also optimizing for cost.
- Increase security
We used the default “deny-all” policy, aiming to control all communication workloads regardless of their network segment. By using cloud firewalls, we augmented our defenses, enabling us to proactively detect and mitigate potential threats while reducing operational overhead.
- Use Infrastructure as Code
Our commitment to Infrastructure as Code (IaC) rounded out our design principles. IaC enabled us to introduce automation, minimize risks, and have finer control over resource changes.
Following these principles, we built a resilient, efficient, and secure global cloud network, aligning with evolving business needs and modern cloud practices.
Our project employed a two-phased approach: an initial setup, followed by a Cloud WAN revamp. This approach enabled us to adhere to good practices and safely transition to our modern cloud network architecture.
First Phase: Establish Regional Network with AWS Services
We established AWS Transit Gateways (TGWs) in each Region to streamline intra-Region connectivity. Our teams were already familiar with the service and the using TGWs aligned with Cisco’s default On-Ramp solution.
We used AWS Cloud WAN to enable inter-Region connectivity (between Regional TGWs) using AWS’s backbone, linking Regions even without Cisco SDWAN vEdge devices. This improved inter-Region connectivity and also paved the way for more network enhancements.
As much as possible, we wanted a seamless Cisco SDWAN-AWS integration, so we emulated Cisco’s SDWAN Cloud On-Ramp auto-provisioning. Rather than manual configurations, we employed IaC to establish a “connect” VPC. In this setup, we deployed Cisco vEdge 8000v cloud routers in two of our three Regions. The third Region provides us flexibility in infrastructure deployment. Here, we exclusively used AWS Cloud WAN, allowing us to be selective and cost-effective with the use of vEdge 8000v routers. We wanted to evaluate native connectivity to Cloud WAN as an approach to reduce cost in the tertiary Region.
Our network transitioned from a flat structure to a segmented framework with PROD, NONPROD and SHARED segments, ensuring traffic isolation for security. We incorporated segment-based route tables at the Transit Gateway level for inbound and outbound traffic, inter-Region traffic, and internet traffic. On AWS Cloud WAN, we designed segment and Region-specific route tables (9 in total), along with segment-based tables for inter-Region traffic.
We positioned AWS Network Firewalls in each Region to govern North-South and East-West traffic with granular control, efficient traffic filtering, and isolation across segments like PROD, NONPROD, and SHARED.
We adopted a flexible, streamlined firewall rule model that is vendor-neutral, promoting ease of transition between solutions.
Automation and Management
We customized an open-source Serverless Transit Network Orchestrator (STNO) solution, which automates the process of setting up and managing transit networks in AWS, to further ease route management and operational burden in our complex cross-Region setup and Cloud WAN integration. Automating route adjustments based on VPC attachments and tags reduced operational costs and effort.
We entrusted Amazon VPC IP Address Manager (IPAM) to manage global IP addresses to enhance overall network control and to optimize IP allocations. We employed a hierarchical structure that segments global IP pools into continent-specific sub-pools and further split into Region-based sub-pools shared across AWS accounts. This structure clarifies IP allocations, allowing precise CIDR block identification for Regions or Availability Zones, improving network adaptability and control.
We used Terraform to implement IaC. By taking advantage of Terraform’s modularity, we’ve structured our network into distinct components like IPAM, TGW, Cloud WAN, firewalls, and connect components. This design allows precise changes, minimizing potential disruptions.
We’ve structured our deployment pipeline and centralized all repositories and artifacts in GitLab. Each Terraform module has its own pipeline that uses services like Checkov to handle testing, vulnerability checks, and linting. These pipelines also auto-generate module documentation, including a CHANGELOG. The root module pipeline oversees the deployment of all modules, allowing precise changes. Deployments to TEST environments are done from feature or fix branches while PROD environments are deployed from the main branch. With this in place, our root module’s test pipeline can safely deploy around 700 AWS resources to a test setup in under 30 minutes.
Second Phase: Simplify and Optimize with Core Network (AWS Cloud WAN)
In the second phase, we honed our Core Network (Cloud WAN) to achieve major simplifications and greater cost efficiency, following our guiding principles. Once appliance mode for core network attachments feature was available, supporting stateful inspections, a Cloud WAN-only approach was feasible. This architecture is shown in the following diagram, figure 2.
Network Architecture and Design
As shown in the phase 2 diagram above, we use AWS Cloud WAN’s core network instead of AWS Transit Gateway. This meant setting up a core network edge in each Region.
VPCs attach to their Regional core edge, auto-joining the intended segment using tags like “bose:network-segment”. Cloud WAN segments are configured with isolation enabled so that their routes do not propagate to the segment’s route table.
Traffic is directed by static routes in these segment route tables. Static routes in these route tables direct traffic through inspection (east/west) or egress (north/south) as needed. When “require-attachment-acceptance” is enabled for a Cloud WAN segment, any attachment targeting that network segment must be approved; otherwise, it remains in a pending state and cannot be used.
To minimize static routes and rely on automatic route advertisement as much as possible, each supported Region has a dedicated set of segments (one for each network segment as well as one inspection network segment). The core network policy document settings guarantee that VPC attachments join the correct Cloud WAN segment in the corresponding Region without the need to explicitly define the Regional network segment.
For example, the “bose:network-segment” tag of a VPC attachment for a VPC in us-east-1 can be set to “nonprod,” and Cloud WAN ensures that this VPC attachment is connected to the “nonproduseast1” segment.
The integration model supports hybrid on-premise connectivity. One Region uses Cisco vEdge 8000v routers deployed on AWS, while the other Region establishes VPN tunnels from on-premises locations to AWS CloudWAN using existing Cisco routers on the Bose SDWAN network. AWS CloudWAN supported both CONNECT attachments (with Generic Routing Encapsulation) and VPN attachments. It also includes dynamic routing by default, allowing the core network to determine the best path between source and destination. In case of network failure, traffic is automatically rerouted in real-time over an alternate network.
After understanding the implications, we merged segment-based network firewalls into a shared firewall per AWS Region to realize greater costs savings.
We enabled appliance mode and dynamic routing in Cloud WAN to add flexibility and redundancy.We disabled route propagation and isolated Cloud WAN segments, increasing network security by making it more difficult for potential threats and unauthorized traffic to move freely across the network.
All routes of the Cloud WAN segments within a Region are shared with the inspection network segment in that same Region. When traffic exits inspection, the route table of the inspection network segment lists all CIDRs within that Region. Static routes based on the summarized IP address space of all other Regions are also added to the inspection network segment’s route table, ensuring proper routing when traffic needs to reach another Region’s inspection. Using appliance mode on AWS Cloud WAN’s VPC attachments ensures that both request and response traffic goes through the same availability zones (AZs) to maintain symmetrical routing through firewall endpoints.
The inspection process employs a shared firewall for all segments per Region, applying a deny-all policy by default (with a few exceptions). Regardless of the network segment, every request or response undergoes processing by the same firewall. Each firewall is configured with a minimum of two firewall endpoints located in different Availability Zones to ensure redundancy and high availability. Firewall rules continue to be managed by dedicated segment-based rule groups to isolate and manage production rules separately from non-production rules. Requests between sources and destinations in the same Region are inspected only once, while requests between sources and destinations in different Regions are inspected twice.
Automation and Management
With this update, STNO is no longer used because AWS Core Network VPC attachments have replaced AWS Transit Gateway attachments and automation for segment assignment and is available out-of-the-box, reducing management overhead.
The preceding diagram (figure 2) illustrates the traffic flow between two cloud resources in different Regions, with VPC A connected to the Core Network Edge in us-east-1 and VPC C connected to the Core Network Edge in eu-central-1.
- The request is sent through the VPC attachment of VPC A into the Core Network Edge (CNE) of Region us-east-1.
- The VPC attachment of VPC A is associated with the “prod” network segment, and the request enters the route table of the “prod” network segment in that Region (e.g., “produseast1”).
- Static routes (e.g., 10.0.0.0/8) for internal traffic route the request into the inspection firewall of Region us-east-1.
- After inspection, the request enters the route table of the inspection network segment in the source Region (e.g., “inspectionuseast1”).
- Static summarized routes (e.g., 10.184.0.0/18) for the Region eu-central-1 route the request into the inspection firewall of Region eu-central-1.
- After inspection, the request enters the route table of the inspection network segment in the destination Region (e.g., “inspectioneucentral1”) and is sent through the VPC attachment of VPC C to the destination.
- The response is sent through the VPC attachment of VPC C into the Core Network Edge (CNE) of Region eu-central-1.
- The VPC attachment of VPC C is associated with the “prod” network segment, and the response enters the route table of the “prod” network segment in that Region (e.g., “prodeucentral1”).
- Static routes (e.g., 10.0.0.0/8) for internal traffic route the request into the inspection firewall of Region eu-central-1.
- After inspection, the request enters the route table of the inspection network segment in the source Region (e.g., “inspectioneucentral1”).
- Static summarized routes (e.g., 10.176.0.0/18) for the Region us-east-1 route the request into the inspection firewall of Region us-east-1.
- After inspection, the request enters the route table of the inspection network segment in the destination Region (e.g., “inspectionuseast1”) and is sent through the VPC attachment of VPC A to the initial requestor.
In summary, our comprehensive overhaul represents a strategic shift toward a more streamlined cloud network. We realized a lot of benefits of this change:
- Cost optimization through reduced firewall endpoints
- Reduced complexity in managing inspection for shared services
- Native network segment management
- Elimination of custom tooling requirements
- Centralized policy management
- Global network metrics and dashboards
- Cloud firewall inspection for all traffic (both cloud and on-premises)
- Taking advantage of AWS Cloud WAN feature roadmap
- Built-in automation capabilities
- Additional but limited development required (2-3 sprints effort)
- Single firewall solution for all segments
- Lack of post-inspection segmentation
By embracing modern best practices and cloud-native features, we not only addressed current challenges, but have also positioned ourselves for sustained growth and adaptability in the future.
To learn more, review some of the other blog posts that helped answer some of our initial questions during our journey:
- General SD-WAN and AWS Networking content:
- Segmentation and inspection content:
The content and opinions in this post include those of the third-party author and AWS is not responsible for the content or accuracy of this post.