AWS Architecture Blog
Field Notes: How OLX Europe Fights Millions of Bots with AWS
September 8, 2021: Amazon Elasticsearch Service has been renamed to Amazon OpenSearch Service. See details.
This post was cowritten with Daniel Loureiro, Lead Security Engineer at OLX Group Global Information Security.
At OLX, infrastructure is constantly being scanned and targeted by advanced malicious botnets. These botnets either scrape the content of their websites to monetize it on other competing websites, threaten the availability of their service with DDoS attacks, or brute force login pages to compromise legitimate accounts. The last attack type is known as Account Takeover.
With compromised accounts, malicious actors can conduct fraudulent activities like selling fake products on OLX marketplaces, which is a major business risk for a company that puts the trust and security of their users in their top priorities. Those risks were addressed by a bot protection service that was expensive, and lacked flexibility in terms of control and visibility. OLX took their CDN migration opportunity to look at alternatives for bot protection, including building their own in-house solution.
Within a period of 3 months, the OLX security team successfully built on AWS a highly flexible edge security solution, including bot management, that protected their infrastructure at scale (240 billion request/ month) across multiple AWS accounts and regions, and reduced the costs of CDN and Bot protection components by 50%.
In this blog, we will share the details of OLX’s edge security architecture to enhance their bot protection as shown in the following diagram.
OLX’s Edge Security Architecture
Laying the protection foundation with AWS WAF and AWS Firewall Manager
The OLX team started by defining the goals of their new edge security platform. They wanted to implement centralized security governance, which maintains the original security level capabilities. It was also important to keep management processes simple for the security team in terms of rule updates, and understanding why a request was blocked for example. Finally, the transition to the new architecture should be possible with minimal risk of disrupting users and within the timeline before their existing CDN contract’s termination.
With that in mind, they decided to enable AWS WAF on their public endpoints to block requests at the application layer. OLX uses AWS Managed Rules as a standard baseline for protection in AWS WAF and complements it with their own custom rules, including IP matching rules.
To simplify creating and updating rules in their AWS WAF WebACLs, OLX created a code repository in a central AWS account to store the rules for all of their websites. Each rule is represented by a file that describes the rule in AWS WAF JSON format. The repository has a hierarchical structure, that allows security engineers to apply a rule to a single website, to a group of websites, or globally, by dropping the rule file to the appropriate folders. For example, in the following illustration of OLX’s repository, the allow-ipset-global-whitelist.json, allow-ipset-group_x-whitelist.json, allow-ipset-olx.pt-whitelist.json, and AWSManagedRulesCommonRuleSet.json rules apply to their olx.pt website.
OLX’s repository for AWS WAF rules
After a rule file is added or updated, a security engineer commits the change to their source repository using git. This action triggers a CI/CD pipeline, that consolidates rule files. This then generates the CloudFormation templates of Firewall Manager WAF policies, and finally deploy them in the central AWS account. From that point, Firewall Manager automatically deploys associated WAF WebACLs on their CloudFront resources in different AWS accounts and Regions. With many AWS accounts used in their production environment, OLX reduced deployment times and required resources with this central security governance.
Using AWS WAF logs for visibility and support engineers
For OLX, enabling logs in AWS WAF was essential for their customer support operations and to enhance their protections by analyzing suspicious user behaviors. OLX leverage AWS Firewall Manager to automatically enable logging in their AWS WAF WebACL across AWS accounts and regions.
OLX ships their AWS WAF logs using Amazon Kinesis Data Firehose to an Amazon S3 bucket in their central account, for archival and occasional querying with Amazon Athena. However, with 400GB of log shipped daily to Amazon S3, this solution is expensive and slow for the access patterns required by their support engineers. In addition to shipping full logs to Amazon S3, they also ship logs to an Amazon Elasticsearch cluster which provides a faster service to their support engineers. They only ship log records of requests that were counted or blocked using AWS WAF. This was done by filtering out log records of allowed requests using the transformation capability in Amazon Kinesis.
Logs of blocked requests allowed them to analyze suspicious traffic. Additionally, logs of counted requests help them understand the impact of a rule before switching it to block mode in production.
The team then built a chat bot that can be used by customer support engineers in Slack to query specific requests and understand why they were blocked. Requests are identified by a unique id (request-id) generated by Amazon CloudFront for each request received, sent back in HTTP response and in AWS WAF logs. This helps their support engineers respond faster to customer tickets on a daily basis.
Enhancing bot protection
The final piece in their edge security was bot mitigation. They were already implementing rules that blocked malicious bot traffic, such as rate limiting and Amazon Managed IP reputation list, but that was not enough to fight sophisticated bot attacks. To provide more in-depth protection, OLX built a solution, inspired by AWS security automations. This solution that analyzes threat events from multiple sources, and automates decisions to block suspicious bot IPs. Their logic runs on an EC2 instance, and consumes regularly threat data from three sources:
- Blocked IP lists by native AWS WAF rate limiting rule. In rate limiting rule, AWS WAF blocks an IP temporarily if it sends more than X request in the last 5 minutes (100 minimum threshold), but allows it back again if it behaves below limit. The OLX security team decided to block those IPs for a longer duration to reduce the risk of adapting bots. They fetch the list of blocked IPs by rate limiting from the AWS WAF API operations in different accounts.
- Amazon Cognito, which is used for user authentication in their websites. When Advanced Security is enabled in Amazon Cognito, Amazon Cognito examines a number of attributes of a user sign-in request to evaluate its risk. For example, it checks whether the user has used the same device before, or has signed in from the same location or IP address. The risk metadata can be consumed in Amazon Cognito Events using a Lambda function. OLX sends high risk events to an Amazon Elasticsearch Cluster.
- New Relic, which is used to monitor their website traffic on the client side. By querying their New Relic logs, OLX is able to detect suspicious behaviors by analyzing aspects like their navigation behavior on their websites.
When a bot IP is identified, and a decision is made to block it using AWS WAF, their solution automatically commits a change to their rules repository. This is turn triggers their CI/CD pipeline to deploy the change within minutes.
In order to reduce the false positives rate, the OLX security team introduced the notion of working mode. According to an existing threat level, their team set a working mode which adapted their automated response to requests from identified suspicious IPs. Currently they operate within three working modes:
- HIGH RISK: Block all identified malicious IPs without any further validation;
- MEDIUM RISK: Block an identified malicious IP if it is present in a bad reputation IP list, or when its source country is not from their main markets.
- LOW RISK: Block an identified malicious IP when its source country is not from their main markets.
Automated responses in different Working Modes
This architecture is adaptive, as more threat data sources can be added to the system with time to improve protections and adapt to business needs.
One week after OLX moved their infrastructure to AWS, they were targeted by a major DDoS attack (around 700K requests per second) on some of their public websites. They were able to respond and block attacker IPs using the bot mitigation system they built. In a short timeframe, OLX was able to quickly iterate on their edge security system to reach a production ready level. OLX will continue improving their system, plan to release their tooling as an open-sourced solution once it is polished and reviewed.
In their own words: “If you decide to embark on a journey like this, ensure enough time is reserved to do this work, and work closely with AWS Engineers when you can, we had great support. AWS simplifies the work so much that you can build a solution to fit your organization without much effort if properly planned. “ – Daniel Loureiro
OLX Global, a Digital Native company owned by the South-African group Naspers, operates the fastest-growing network of trading platforms globally, serving more than 300 million people every month. Present in 32+ countries with leading positions in 22 around the world, OLX is helping its customers to buy and sell cars, find housing, get jobs, buy and sell household goods, and much more.
Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.