AWS WAF governance at scale

Overview

Large organizations, with multiple teams developing and operating their own web applications or APIs, employ tools and processes for driving consistency of security controls across teams, to avoid exposed endpoints with weak or no protections. AWS Firewall Manager is a tool that organization can use to govern AWS WAF and Shield Advanced deployments at scale.

AWS Firewall Manager

Firewall Manager allows you to define AWS WAF or Shield Advanced policies that are deployed automatically across your publicly exposed resources such as CloudFront distributions, ALBs or API Gateways. A policy consists of:

A scope defining where it applies: What type of resources (CloudFront, ALB, etc..)? include or exclude resources with specific tags? which accounts or organizational units to include or exclude?
Rules defining which WAF rule groups to apply? whether to enable logging centrally, and whether to add Shield Advanced protections.
An action defining what to do when a resource is found within the scope of a policy. For example, you can automatically enforce policy rules or just report it. In an initial Firewall Manager deployment, it's recommended to start without auto remediation, to identify resources requiring manual handling with minimal impact to existing applications. When a higher confidence is attained, you can switch Firewall Manager to auto remediation.

To use Firewall Manager, consider first its prerequisites. Note that AWS config is one the prerequisites of Firewall Manager. To optimize the cost of AWS Config, if only enabled for using Firewall Manager, limit the Resources types to record setting to the relevant resources for your scenario (e.g. WAF, CloudFront Distribution, Load Balancers, etc..). Also note that policies are regional constructs (e.g. you need a global policy for CloudFront, and a regional policy in each region where you have regional resources such as ALBs and API Gateways). Consider this AWS Solution to facilitate the deployment of Firewall Manager policies across AWS Organizations

AWS WAF deployments at scale

When you create a WAF policy, Firewall Manager deploys a WAF WebACL with the policy WAF rules in the AWS Accounts within the policy scope. In a WAF policy, you can define two types of rule groups that are be added to the deployed WebACL by Firewall Manager:

A first rule group, which will be evaluated before any other rule.
A Last rule group, which will be evaluated at the end.

This allows you to give a central security team the possibility to manage common rule groups across your organization, while giving your application teams the possibility to add custom rules relevant to their application, between the first and last common rule groups. Since rules in AWS WAF are evaluated by order, the first common rule group are be evaluated before any other rule, followed by the rules created by the application teams, and finally the last common rule group.

You can build a CI/CD pipeline to update common WAF rule groups in the AWS WAF policy on the administrative AWS account, which is then deployed by Firewall Manager across your organization within minutes. Learn in this blog how OLX deployed a central WAF policy using a CI/CD pipeline, with a central logging system.

Common WAF governance models

Firewall Manager is a flexible tool that allows you to establish various security governance strategies depending on the requirements of your organization. In any centralized security governance, you need to make a tradeoff between how much you enforce rules centrally to increase protection levels, versus how much you want to handle false positives caused by centrally deployed common rules.

Single policy for mitigating critical threats

If you have highly autonomous application teams, and you want to avoid managing false positives, create a single central WAF policy that addresses critical threats. For example, you can create WAF rules based on rate limits with high thresholds, combined with high confidence malicious IP reputation lists, and geo-blocking rules for embargoed countries. You can also enabled Shield Advanced and activate Automatic application layer DDoS mitigation. These rules tend to have very low false-positives but effective in protecting against HTTP floods. In addition, when critical and high impact Zero Day vulnerabilities are uncovered, you can apply mitigations centrally using the deployed WAF common rule group.

It's recommended to create an internal wiki for your application teams, with guidance on best practices for adding custom WAF rules in their WebACL, that is relevant to their application. For example, guide them to add protections against SQLi and XSS attacks if their application is vulnerable to such attacks.

Single policy for mitigating a wide array of threats

If you want to increase your central security coverage to a wider array of threats, harden your central common rule groups, but give the application teams the possibility to manage false positives autonomously. To implement this WAF governance model, put your common rules in the first rule group of the WAF policy in count mode. These rules will only be emitting labels, which you can use in the last rule group of the WAF policy to block requests matching these labels.

If your application teams encounter false positives, they can create exclusion rules using the labels emitted by your rules. To illustrate this with an example, consider the scenario where Amazon Managed Rules (AMR) for protecting against SQLi is added in count mode to the first rule group. In the last rule group, a rule blocks requests with the label label_matched=”SQLi_BODY” emitted by the aforementioned AMR. If the AMR introduces a false positive to an application on a specific url (url=”/form1”), the application team can create an exclusion rule in the WebACL that mitigate this false positive (e.g. IF url=”/form1” AND label_matched=”SQLi_BODY” then ALLOW). The allow rule action is terminating, which means that AWS WAF will stop evaluating subsequent blocking rules.

To roll out changes to this policy without impacting existing applications, consider creating a replica of this policy to be used in staging environments by application teams. Both policies need to have mutually exclusive scopes. For example, production policy applies to all CloudFront distribution except those with staging tag, and the staging policy to all CloudFront distributions with the staging tag. For most updates, you can first roll them out to the staging policy, and notify all application teams using an SNS topic. Once notified of a change, application teams test the new policy version in their staging environment, which can be automated, and manage false positives if needed. Then after an agreed delay, the central team propagates the change to the production policy. Critical updates that can't a week, such as protections against Log4j CVE, can be applied immediately at the expense of some false positives temporary, until the application teams handle exceptions.

If you are looking to enforce a consistent security baseline to be applied while still allowing for some customization by account administrators. This article outlines steps to design and implement a centrally managed security baseline policy. It also details best practices for testing and deploying the policy.

Multiple policies for different application types

If you the same requirements as before, but you want to reduce the cognitive load of hardening the application security on the application teams, consider creating a catalog of policies for different application types present in your organization. For example, you can have a catalog with two policies:

First Policy: Recommended to protect Wordpress based applications
Second Policy: Recommended to protect PHP applications with an SQL database. You can create two versions of this policy, with different blocking sensitivity levels. This way application teams can choose the one that meets their security requirements (paranoia level), and their willingness to manage false positives.

The scope of each policy is defined by a specific tag (e.g wordpress for first policy, and LAMP_HIGH/LAMP_LOW for second policies). Application teams consult the catalog of available policies and apply the tag of the desired policy to their resources. Firewall Manager automatically associate WAF WebACLs to their resources.

Note that with this approach, you can manage false positives, and stage changes in the same way described in the previous section.

Application level behavioral detection

At your application level, you can use custom signals to identify abnormal behavior, based on what is expected by your application. For example, you might expect users to navigation your application in a certain order, or you do not likely expect a user to order certain goods from/to certain countries based on his registered address. Using such signals, you can automate your response using AWS WAF, for example by blocking or challenging using CAPTCHA requests coming from IPs with suspicious application level behavior. To get started with the concept of WAF automation based on application signals consider the examples in this AWS Solution.

Advanced automations include:

Consuming high risk events emitted by Cognito during signin/sign up process.
Consuming high risks events identified by Fraud Detector. Fraud Detector uses machine learning (ML) and 20 years of fraud detection expertise from Amazon Web Services (AWS) and Amazon.com to automatically identify potential fraudulent patterns performed by humans and bots in real-time. Fraud Detector allows detections of Fraud by analyzing application-level user behavior, using your own historical fraud data to train, test, and deploy custom fraud detection machine learning models tailored to your use case.

A fully managed policy for each application team

If you prefer to completely offload the WAF management from the application teams to your central security team, then create a dedicated policy for each application team, with a scope that only applies their AWS Accounts. In this scenario, you need to create processes for the initial setup, and communication channels between the applications teams and your central security team for operations such as managing false positives.

Resources

Was this page helpful?

Feedback