Ataccama: Building our global network with AWS Cloud WAN
Ataccama is a global software company with a unified platform for automating data quality, MDM, and metadata management – Ataccama ONE. We specialize in complex enterprise data governance solutions that provide sustainable, long-term value. At Ataccama, we migrated our global wide area network to AWS Cloud WAN to simplify configuration and management. In this post, we explain the challenges we faced, the reasons we chose AWS Cloud WAN, and how we migrated to a new global network.
As we moved more workloads to the cloud and Ataccama continued to expand globally, managing network connectivity was becoming exponentially more complex. With a network footprint spanning several AWS Regions, on-premises locations, and other Cloud providers, routing configuration was challenging. It required the configuration of many networking hubs, and the centralization of logs and metrics to facilitate operations. Automation helped make the network easier to run, but it was hard to find time to build automation tools while keeping up with operational demands.
Early on, we saw cloud environments as just another node on our network. But over time, our network center of gravity has shifted to the cloud. AWS Regions have become our new data centers, and AWS network infrastructure has become the main underlay of our global network. Except for last mile connectivity, AWS is what connects us.
AWS Cloud WAN has simplified and automated configuration of our global network. It provides a central point for network definition (by using network policies) and visibility. We only need to indicate which AWS Regions we want to use, how we want to segment our network traffic, and the resources to connect (Amazon VPCs or on-premises environments using AWS Site-to-Site VPN, for example), and AWS Cloud WAN takes care of routing configuration. In addition, AWS Cloud WAN is integrated with the wide array of AWS tools for network operations that help us monitor and observe both our AWS resources and on-premises networks.
In this post, besides network policy configuration, we focus on the AWS Cloud WAN connect attachment type. We use connect attachments to link an AWS Cloud WAN core network edge (CNE) object with third-party virtual appliances running in a VPC. Connect attachments support Generic Routing Encapsulation (GRE) tunnels and Border Gateway Protocol (BGP) for dynamic routing and play a large role in how we integrate the worldwide nodes of our AWS Cloud WAN core network while providing dynamic routing and resiliency for last mile connectivity.
As Ataccama was expanding around the world, we faced the perennial challenge of scaling our network. When we started, we were operating out of one server room in Prague. This quickly became a bottleneck and latency to distant offices was very high. We started a project to rebuild our infrastructure worldwide and create a backbone network that would span across all regions where we need a presence. And, all of this was done while maintaining security segmentation with identity firewalling, IPS, IDS and other security features.
We started the project using AWS Transit Gateway, but this proved to be complex and prone to configuration challenges. We saw the release of AWS Cloud WAN in December 2021 and immediately began redesigning our network around it.
Why we chose AWS Cloud WAN
For Ataccama, the biggest benefits of AWS Cloud WAN are the centralization of policy and configuration, native segmentation, and built-in automation of network operations. Moreover, thanks to full BGP and GRE tunnelling support, we can easily place a next generation firewall (NGFW) solution to control traffic flows using the BGP features of those solutions. We simply attach each segment of our network to the NGFW using a separate GRE tunnel (all of the tunnels use the same VPC attachment as transit), and assign that tunnel to a specific security zone. In this way, network traffic remains segmented, but we can use the firewalls across segments when we decide to allow it.
The following diagram (figure 1) shows our architecture in two AWS Regions. We are located in several AWS Regions, with the same configuration in all of them.Our security requirements dictate that all traffic crossing our network segments first goes through our NGFWs. Then, we use identity-based firewalling to control employee network access, make sure that all internet traffic is analyzed, and perform SSL decryption on internet traffic from VPCs with no additional configuration. Once we had the global network in place, it was easy to interconnect everything–not just VPCs but anything that supports IPSec. We connected all of our offices to the core network, so we could move all our infrastructure tools to AWS. In fact, even our wireless controller lives in AWS, and all our Wi-Fi access points connect to the controller using AWS Cloud WAN.
In addition, thanks to native BGP support, we could build global failover capability. With this in place, if an NGFW fails in one AWS Region, the NGFW in another AWS Region takes over. This lets us reduce the costs related to firewall appliances as we need fewer licenses to achieve high availability in every AWS Region.
Migrating to AWS Cloud WAN
The biggest migration challenge we faced was moving everything to the new infrastructure without any disruption to our business. As we grew rapidly, we had not established the tools and processes needed to do a “big bang,” one-step migration.
Luckily, early on we decided to build everything from scratch, even the network hardware in our offices. Thanks to this, we could prepare everything on site – including new Microsoft Active Directory, Wi-Fi, and security solutions. For some time, we had two functioning infrastructures running in parallel: one was the legacy network, where it was difficult to interconnect anything, and the new network based on AWS Cloud WAN. The only thing needed to interconnect both infrastructures was a new AWS Cloud WAN segment, where we attached a VPC with an old VPN connector (based on a simple Amazon Elastic Compute Cloud (Amazon EC2) instance). We updated the AWS Cloud WAN policy with a static route forwarding all traffic to that VPC with the VPN concentrator EC2 instance. These routes were propagated to the entire network using BGP.
Both networks were interconnected in just a few minutes. Right away we could start creating firewall rules to control the traffic that was allowed to flow between the networks. Once this was built, we started migrating users from the old network to the AWS Cloud WAN network one at a time without affecting their ability to work.
We needed to isolate some services, so that they could not work in parallel on both networks. To migrate those services, we simply detached the VPCs from the old network, and then attached them to the AWS Cloud WAN core network.
Example: Traffic flow between Toronto and Frankfurt
To show how this works, let’s look at how a Wi-Fi Access Point (AP) in our Toronto office connects to a wireless controller (WLC) hosted in a VPC in the Frankfurt Region. The following diagram (figure 2) shows this flow and overall architecture (for simplicity, we only show the North Virginia and Frankfurt Regions involved in the connection).
- When the AP wants to connect to the WLC, the communication first goes through the local infrastructure in the Toronto office. Traffic is forwarded to the local NGFW, which determines the best entry point to the AWS network backbone using BGP.
- Under steady state operation, it selects the closest NGFW in AWS, which is the one located in the North Virginia Region (us-east-1).
- The NGFW in us-east-1 forwards the traffic to the core network transit segment, which interconnects all NGFWs in all AWS Regions. Each AWS Region has its own NGFW that handles traffic going to that specific AWS Region.
- In this case, the WLC is hosted in a VPC in the Frankfurt Region (eu-central-1), so AWS Cloud WAN delivers the traffic to the NGFW there.
- Then, the NGFW in eu-central-1 verifies the communication using security policies and sends it back to the core network IT Services segment, which is used for all services for internal IT purposes.
- Then, AWS Cloud WAN delivers the communication to the destination VPC attached to this segment in eu-central-1.
Reverse communication must follow the same path back to the Toronto office (no asymmetric routing) so that the NGFWs do not block the response. This is achieved using BGP metrics alterations that make sure all communication is forwarded to the closest NGFW from all segments. Then, the NGFW handles the rest using the transit segment or Site-to-Site VPN tunnels.
Results and future improvements
Using AWS Cloud WAN, we could deploy a secure, segmented global network to interconnect all services running in AWS, our offices, our employees around the world, and our other Cloud environments.
Thanks to the use of BGP, all routing is automated and highly available. When we attach a new VPC to AWS Cloud WAN, the respective IP range is propagated everywhere and immediately available. We have not experienced any performance issues since moving to AWS Cloud WAN, and we terminate our user VPNs with full traffic tunnelling on the NGFWs. Managing our AWS Cloud WAN network is intuitive and easy, using either use the AWS Management Console, AWS Command Line Interface (AWS CLI), or Infrastructure-as-Code (IaC). In addition, the Network Manager section of the Console shows the state of the entire network, including the topology and logical segments.
Making sure of symmetrical routing, so that traffic goes through the same NGFWs in both directions, was time consuming. Stateful firewalls would otherwise block all traffic that goes asymmetrically. Unfortunately, there is no way to manipulate BGP metrics inside AWS Cloud WAN. Therefore, we had to do all this manipulation inside our NGFW. This included sending only the default route to all segments and creating a transit segment with all the routing information. We set up route filtering using regular expressions and templating on each NGFW to advertise only the routes that originate from locally attached VPCs and not from those in other AWS Regions to the transit segment. This design added configuration overhead when making sure of high availability, given that the NGFW we use is not capable of conditional advertising. They cannot advertise specific prefixes based on the conditions we define. This would be much simpler if BGP metrics could be manipulated inside the CNE.
At Ataccama, AWS Cloud WAN has simplified our global network configuration, allowing us to have multi-Region dynamic communication with the simplicity of a policy definition, while using NGFWs integrated with BGP to fulfill our security requirements. With AWS Cloud WAN, we have expanded our network both inside and outside AWS with far less operational overhead than before. Our AWS Cloud WAN network gives us highly-available, scalable, and cost-effective connectivity to links with our locations and VPCs around the world.
About the Authors
The content and opinions in this post include those of the third-party author and AWS is not responsible for the content or accuracy of this post.