Mitigating DDoS with data science using AWS Shield Advanced and AWS WAF
This blog post helps customers in mitigating distributed denial-of-service (DDoS) using AWS Shield Advanced, AWS WAF, and data science. We explore how to use these services along with machine learning (ML) to detect and mitigate DDoS attacks.
Bad actors conduct DDoS attacks using botnets. Through botnets, attackers look for zero-day vulnerabilities—specifically on network devices such as routers—and make these devices a part of their network. Bots speared around the world are waiting for instructions from control servers.
This post examines a real-world use case. Online payment solutions company Razorpay has a business-to-business (B2B) model where merchants invoke APIs for payments. Given the complex nature of B2B payments, the traffic patterns between small-scale merchants and large-scale merchants needs to be distinguished. This business model puts the company at risk for DDoS attacks, and here is how AWS helps them address this ongoing concern.
First, we will introduce an initial DDoS architecture approach and then how it was modified to meet specific Razorpay needs, as is possible for cross-industry client with varying requirements.
Basic DDoS recommendations
Let’s explore some out-of-the-box DDoS business solution recommendations for different resource categories:
- Static websites: Amazon CloudFront is a content delivery network (CDN) service that is used to host static websites. The AWS WAF rate-limiting rule can be used to limit traffic on a CDN.
- Dynamic websites: In this scenario, a standard WAF Rate Limit rule can be used on Application Load Balancer (ALB).
- APIs with customer context: APIs receive these assets on behalf of other merchants (for example, an end-user places a food order from a food-delivery application, completes a payment, and the request is sent to api.razorpay.com. Razorpay understands that the end user is trying to make a payment toward the food app).
Figure 1 represents the difficulty levels and solutions for each resource category.
DDoS response phases
We’ve discussed initial DDoS architectures, so now let’s explore the various DDoS response phases:
- Phase 1 – Automated inventory: Creating an automated inventory system capturing the list of APIs or websites exposed. The APIs are ranked by exposure and risk of an outage to create a list of assets that is ranked on the basis of risk.
- Phase 2 – Bucketing assets into groups: Bucketing APIs and websites (static or dynamic) into groups. This phase also subgroups the assets into unauthenticated or authenticated, users and more. As we will discuss later in this post, most Razorpay APIs have a customer context requiring a multilayered defense mechanism.
- Phase 3 – Testing the solution: Simulating attack traffic to test the solution under different loads, origins, and more.
AWS DDoS solutions
AWS Shield has three important features for DDoS mitigation:
- Alarms: Triggers an alarm when a DDoS attack is suspected, customized based on metrics such as total volume, error rates at the ALB, and response latency
- Visibility: Provides top five important field values in real time, such as requesting client IPs, countries, user agents, referrer headers, and url routes to take action.
- On-call support: Provides on-call support from the Shield Response team (SRT) to understand attack vectors and create AWS WAF rules based on insights.
AWS WAF has two rule types:
- Blocking: Blocks requests matching expected variables, from individuals to a combination of fields such as IP, URL route, body, or country
- Rate limiting: Tracks the rate of requests for each originating IP address, and triggers the rule action on IPs with rates that go over the limit. A scope-down statement can also be added to the rule, to only track and rate limit requests that match the scope-down statement. (We will soon discuss how Razorpay relies heavily on rate limiting it the AWS WAF and other microservice architecture solutions.)
Razorpay microservices protection architecture
Razorpay has two main layers protecting their internal microservices. The request to payment API api.razorpay.com goes from Amazon Route 53 to ALB. AWS WAF and AWS Shield Advanced are deployed on ALB. Requests are then forwarded to microservices fronted by API Gateway as in Figure 2.
Razorpay API Gateway
An API Gateway is a critical piece of infrastructure for microservice architectures. Razorpay is a B2B organization that performs actions on behalf of merchants. To achieve this—and understand context about the merchants and end-users alike—a custom API Gateway:
- Works at a high scale with very low latency
- Understands merchant context to provide authentication and authorization
- Provides security features around malicious traffic
Razorpay uses a self-managed API gateway called Edge. The API Gateway is the first system on the ingress path to understand Razorpay domain context that prior layers cannot. Every request goes through multiple layers of logic at the API Gateway before being forwarded to respective services.
Razorpay Edge and AWS solution insights
As Edge performs multiple computations on each request, Razorpay generates insights for each request to build intelligence and prevent DDoS attacks. These events are fed into a time-series database in near real time and generate insights using machine learning (ML) models. The insights:
- Identify malicious patterns: Generating confidence percentage that helps in defining further action, verifies consumer authenticity, and serves the request. It also blocks malicious requests at the edge.
- Derive pattern-based rate limits: Deriving rate limits based on a larger set of data—including consumer and IP address—by looking at weekly and monthly patterns.
Once this information is available at Razorypay Edge, it can perform various operations to ensure only legit requests reach the service layer. Let’s explore the request flow to understand each stage’s contribution toward DDoS handling, as in Figure 3.
All Razorpay requests flow through API Gateway. Here are some key recommendations:
- Based on feedback data available, the gateway checks if the given request pattern falls in the malicious category. If the confidence percentage is high for a malicious request, block the request. A block rule is applied at the AWS WAF layer. If the confidence percentage is low, use CAPTCHA to check user authenticity.
- Apply rate limit on pre-existing data in the request object using feedback from an ML model.
- Perform authentication/identification and authorization.
- Apply rate limit on data derived from request object, also using feedback from a ML model.
- Request user CAPTCHA verification if the request is identified as malicious with low confidence.
As a best practice, publish generated insights for each request to Apache Kafka out of the request flow to avoid latency impact and reduce blast radius.
In this post, you learned about addressing DDoS attacks with ML models.
Organizations can use the insights data generated by an API Gateway to identify request patterns, categorize the patterns into various buckets, and perform the following depending on category:
- Send feedback to an API Gateway for immediate action
- Update the rate limits for Edge-based historical patterns
- Update WAF rule for blocking/reducing rate limits
As discussed, using AWS WAF and AWS Shield Advanced along with ML helps detect and mitigate DDoS attacks.