Networking & Content Delivery
Experian: Centralized internet ingress using AWS Gateway Load Balancer and AWS Transit Gateway
This is a guest post co-written with Mike Mosher, Sr Principal Cloud Platform Network Architect, and Daniel Lee, Lead Cloud Platform Engineer, from Experian.
Experian is a global technology company that offers credit risk, fraud, targeted marketing, and automated decisioning solutions. We are an AWS early adopter and have embraced the cloud to drive digital transformation efforts. Our Experian Cloud Center of Excellence (CCOE) team operates a global AWS Landing Zone, which includes a centralized AWS network infrastructure. We are also an AWS PrivateLink Ready Partner and offer our E-connect solution to allow our B2B customers to connect to a range of products through private, secure, and performant connectivity. In this post, we highlight the evolution of Experian’s inbound internet architecture to secure and meet throughput demands as we migrate remaining core business operations and products to AWS as part of our preferred cloud provider announcement.
Securing inbound traffic is critical to Experian, as we protect all public facing applications and APIs using web application firewalls and next generation firewalls that include intrusion prevention capabilities. The Experian IT shared services team centrally deploys and manages cloud networking and security. They prevent business unit accounts from deploying internet gateways (IGWs) in their AWS accounts.
Expanding the Experian global footprint with AWS
Experian maintains a global AWS footprint across 13 AWS Regions in North America, UK, EMEA, APAC, India, and Brazil. We use AWS Direct Connect for hybrid network connectivity. When Experian first migrated into the cloud, we extended our network to colocation facilities near each AWS Region, which hosted centralized security infrastructure, internet circuits, and Direct Connect links. In this design, inbound internet traffic destined to AWS first traversed through a colocation, then through an Experian corporate security stack, and then to AWS. This design added additional latency, cost, manual configuration, and operational complexity. The business units wanted a solution that allowed the traffic to stay on the AWS network without traversing the corporate network and security stack in the Experian colocation. Using an AWS-native solution would provide more performant, scalable, cost effective, and automated connectivity for the application owners. However, the IT operations teams did not have the capacity to manage a cloud-native solution for firewall and IPS inspection. These teams preferred to continue operating the same security appliances that they were already using on-premises. As the size of Experian’s cloud presence grew, we knew it was time to build a centralized inspection platform on AWS that scaled with our ever-growing needs.
Centralized inbound inspection solution overview
Experian deployed a centralized inbound inspection architecture on AWS to provide centralized and secure access to Experian applications hosted by business units in spoke accounts. For every AWS Region in which Experian operates, we deployed virtual security appliances in a centralized Ingress Amazon Virtual Private Cloud (Amazon VPC). We are migrating existing applications to this solution and all future applications built on AWS will use this solution. The Region with the largest Experian footprint supported by this solution contains over 220 applications, receives over five million requests per hour, and processes more than 70GB of traffic per hour.
The architecture has evolved over multiple phases. The initial design used an inbound public AWS Application Load Balancer (ALB) as the frontend for an Auto Scaling group of firewalls. This is often referred to as the Amazon Elastic Load Balancer (Amazon ELB) sandwich pattern. After AWS introduced the AWS Gateway Load Balancer (GWLB), Experian added a GWLB in front of the firewalls to improve scalability and availability of the design. The centralized ingress model also provides the Experian Security Operations team with a smaller and more familiar footprint to manage and offload frontend security from development teams. Experian can manage our global fleet of on-premises and cloud security appliances using a single management portal.
The architecture, shown in the following diagram (figure 1), consists of a centralized Ingress VPC which includes a public ALB, GWLB, GWLB endpoints (GWLBe), and third-party virtual security appliances that run in an Auto Scaling group that can add and remove appliances as needed based on vCPU thresholds. The security appliances are dedicated to inspecting internet ingress traffic flows. The AWS Transit Gateway is configured with production and non-production route tables. A production and non-production Ingress VPC is used to further isolate production and non-production environments.
Ingress traffic flow
- All traffic entering the Experian cloud environments is routed through a third-party, cloud-based web application firewall solution. This provides web application firewall and DDoS protection before the traffic reaches Experian’s infrastructure, as shown in the preceding diagram (figure 1).
- Public ALBs are used to accept and identify traffic for multiple Experian applications. ALBs use routing rules to differentiate applications based on domain name in the Host header or other factors like path. Then, ALBs can route traffic to unique IP Target Groups and send it to different backend targets in unique VPCs.
- Traffic from ALB subnets is routed to the GWLBe in the same Availability Zone (AZ), and then through the GWLB, which sends traffic to the security appliance. This makes sure that all traffic is inspected by the security appliances.
- The security appliances are deployed across multiple AZs for high availability and scalability. GWLB encapsulates original IP traffic with the GENEVE header, uses 3 or 5-tuple IP packet information to pick an appliance to route traffic through for the life of the flow, and then forwards to appliance over UDP port 6081. The appliances provide stateful packet inspection, HTTPS decryption, and IPS functionality. If traffic is allowed, then it re-encapsulates the traffic with the GENEVE header and forwards to GWLB.
- After inspection, the GWLB removes the GENEVE header and forwards traffic to the appropriate GWLBe. Then the route table for the GWLBe routes traffic to a Transit Gateway attachment subnet.
- The Transit Gateway routes traffic to the private IPs of the Amazon Elastic Compute Cloud (Amazon EC2) instance or AWS Network Load Balancer (NLB) for the target application. This could be an application hosted in a VPC, a private Amazon API Gateway, an on-premises application hosted in an Experian data center, or other locations on the Experian network.
Experian leverages host-based routing on the public ALB to route incoming traffic based on the domain name specified in the host header. As shown in the following diagram (figure 2), requests to app1.experian.com are routed to the target IP address of the private resource (e.g., Amazon EC2, API Gateway, NLB) in the business unit VPC where the application is hosted. This design allows Experian to onboard up to one hundred Experian applications to the platform using a single ALB. Scaling out is achieved by deploying multiple ALBs, which separates applications, minimizes scope of impact, and contains operational tasks within manageable quotas. Experian can onboard more applications by deploying more ALBs to the platform.
Egress traffic flow
Traffic must egress from the same GWLBe from which the source traffic originated. Otherwise, the GWLB drops the traffic. In this architecture, the return traffic may come back to the Ingress VPC from a different AZ than it originated (see AWS Transit Gateway traffic flow and asymmetric routing). To avoid asymmetric traffic flows as recommended by AWS, Experian configured the Transit Gateway attachment subnet route tables to route the return traffic back through the same GWLBe that the source traffic used, as shown in the following diagram (figure 3).
Solution benefits
This centralized ingress solution has allowed Experian to minimize end-to-end latency, reduce costs associated with traversing a centralized network and security stack in Experian colocations, and simplify our hybrid-networking operations. The design allows a centralized cloud networking and security team to manage all internet ingress across Experian. All upgrades are now deployed with Infrastructure-as-Code (IaC) with AWS CloudFormation. Operations and troubleshooting are also simplified by reducing the number of hops and devices in the network path. Experian has automated the application onboarding process, allowing business units to request application onboarding though a request form that, once approved, kicks off automation to configure the resources in the inspection VPC.
Several business units at Experian have benefited from the new ingress architecture. One of the largest Experian business units’ processes over 150 million transactions per year by offering more than 40 financial service APIs to more than 350 external customers. Experian provides contractual SLAs to meet customer response time and availability requirements, which include penalties for breached SLAs. We have realized a return on investment of over $2M per year. Experian can now deliver service within agreed SLAs more effectively and onboard new clients without impacting existing customer response times.
Experian conducted a detailed analysis for one business unit, which had a 34% average reduction in response times for their most critical APIs. The following table (figure 4) shows a comparison of latency before and after the migration to the ingress architecture. This early success has driven Experian to adopt the architecture as the new standard across their organization.
Key considerations and lessons learned
Experian faced several design considerations while developing the solution. The VPC subnets were optimally planned to allow for future growth to support load balancer scaling, virtual firewall scaling, and additional VPC Endpoints. The ALB public subnets on the right side of the diagram shown in figure 3 each have a /27 CIDR block which allows for 32 total IPs and 27 usable IPs (as AWS reserves 5 IPs per subnet) while meeting the ALB subnet requirement for at least a /27 CIDR and at least eight free IPs per subnet. With each public ALB subnet deployed across three AZs, there are a total of 81 available IPs for ALB scaling. Subsequently, the virtual firewall subnets use /28 CIDR blocks, which allow for 11 usable IPs per subnet. Since the GWLB takes one IP per subnet, and the GWLBe also takes one IP per subnet, which leaves a total of 27 usable IPs for the virtual firewalls to scale up. Additional VPC Endpoints (such as an endpoint to send traffic to private API Gateways) are launched in the Transit Gateway Attachments subnets on the right side of the diagram shown in figure 3. The Transit Gateway attachments use only one IP per subnet, leaving 30 usable IPs for VPC Endpoints.
We considered AWS account quotas for VPCs per AWS Region, IGWs, and elastic IPs. To make sure that we met requirements for future growth, Experian also considered ALB account quotas for the number of load balancers per AWS Region and the number of certificates and target groups per load balancer.
Several architecture options that we reviewed are described in past posts and whitepapers: Ingress Firewall deployment models, Inbound Inspection with firewall appliances and Gateway Load Balancer. The Experian design uses a public ALB placed in front of a GWLB so that ALB routing rules can be used to identify the applications before sending the traffic to the security appliances.
To mitigate asymmetric traffic flows, Experian disabled Transit Gateway appliance mode and configured the Transit Gateway attachment subnet route tables to route the return traffic back through the same GWLBe as the source traffic. Transit Gateway appliance mode was designed for east-west inspection VPC architectures, not ingress/egress VPC architectures.
Next steps
Experian is continually expanding our presence in the cloud, and this platform needs to handle that growth. As we prepare for the future, we must scale past any current limits to the platform. Experian plans to evolve the architecture to include more cloud-native and AWS managed services to remove the undifferentiated heavy lifting of managing, scaling, and operating security appliances.
Security comes first at Experian. Experian invests heavily in cyber security and follows rigorous due diligence processes. IPS is a critical component of our defense in-depth strategy and we are currently evaluating cloud native IPS options that can satisfy our InfoSec requirements. We are also looking at load balancing improvements to further scale our centralized architecture and evaluating open-source load balancing products such as NGINX and HAProxy. This will allow Experian to load balance across a greater number of backend applications. Experian is also building self-service capabilities that allow backend application teams to onboard seamlessly with secure connectivity. All these improvements will help us continue to provide secure and performant applications for Experian’s customers.
Conclusion
In this post, we highlighted a centralized ingress architecture implemented by Experian to reduce latency, costs, and operational complexity. The design included a public ALB that fronts GWLBs and security appliances that act as an inspection proxy within Experian’s multi-VPC, Transit Gateway architecture. The post highlighted the benefits, design considerations, traffic flows, and lessons learned for this architecture.
To learn more, we recommend reading the whitepaper, Building a Scalable and Secure Multi-VPC Network Infrastructure.
The content and opinions in this post include those of the third-party author and AWS is not responsible for the content or accuracy of this post.