Analyzing stale security group rules using serverless architecture
Security is a top priority for AWS and customers running workloads in AWS. The previous post Top 10 security items to improve in your AWS account, covered the top security items that AWS customers should pay special attention to if they want to improve their security posture.
High on the list is the need to manage your network security. A fundamental method is to secure network access to a server or service when connecting it to a network. In an on-premises scenario, you would use a firewall or similar technology to restrict network access to only approved IPs, ports, and protocols. Security groups, network access control lists (NACLs), and AWS Network Firewall provide network security functionality in AWS. Having stale security group rules configured in the network security features and services exposes your network to the Internet and increases the vulnerability of your applications.
In this post, we’ll look at how you can analyze for stale rules in the security groups and remove them based on usage and duration metrics. We’ll also look at a serverless solution leveraging Amazon Simple Storage Service (Amazon S3), AWS Glue, and VPC Flow Logs, as well as visualize the results in Amazon QuickSight. This solution will help parse the VPC Flow Logs and relate it to a particular rule of the security group. It also tracks the usage over time so that it’s easy to track the rules that are frequently used as well as clean up the stale ones.
Best practices for security groups
You should follow these best practices when working with security groups:
- Review rules at least every six months: Stale rules can cause security vulnerabilities. Configure the security groups only with rules that are required for workloads. Remove un-used rules. (PCI DSS 1.1.7)
- Enable only the services, protocols, and procedures required for the system to work. (PCI DSS 2.2.2)
- Monitor the creation or deletion of security groups: This best practice works hand-in-hand with the first two. You should always monitor for the attempted creation, modification, and deletion of security groups. (CIS AWS Foundations 3.10)
- The VPC default security group should prohibit inbound and outbound traffic. For custom security groups, don’t ignore the outbound or egress rules: Limit outbound access to only the destinations that are required. For example, in a three-tier web application, the app layer likely shouldn’t have unrestricted access to the Internet. Therefore, configure the security group to allow access to only those hosts or subnets needed for the correct functioning of the application. (PCI DSS 1.3.4)
- Limit the ingress or inbound port ranges that are accessible: Limit the ports that are open in a security group to only those that are necessary for the application to function correctly. With large port ranges open, you may be exposed to any vulnerabilities or unintended access to services. This is especially important with high-risk applications. (CIS AWS Foundations 4.1, 4.2) (PCI DSS 1.2.1, 1.3.2)
- Limit modification to authorized roles only: Limit the number of roles that have authorization to change security groups. (PCI DSS 7.2.1)
The solution assumes that VPC Flow Logs are enabled and created with Amazon S3 as the destination type. This is based on a serverless architecture that leverages AWS Step Functions to run four Glue Jobs at a scheduled cron interval. The first Glue job parses all of the security groups present in an account and stores them in an Amazon DynamoDB table. The second Glue Job runs Amazon Athena queries to parse the VPC Flow Logs stored in an S3 bucket. The third Glue job computes the usage metrics of rules in security group and stores the results in another DynamoDB table. The parsed data is visualized in QuickSight to generate heat maps to identify which ports/protocols are frequently used.
Finally, the fourth Glue job is used to send an email notification using Amazon Simple Email Service (Amazon SES) client with usage metrics of the security group rule. You can remove these un-used security group rules to meet compliance requirements. This architecture is shown in the following figure.
Two DynamoDB tables are created. The first table (sg-analysis-rules-data) stores existing Security Groups and rules information (Security Group ID, Name, Port, Protocol). The second table (sg-analysis-rules-usage) stores the usage counts. It contains information about the Security Group rule ID, Security Group ID, protocol, flow direction, last usage, and the count on the number of times that a rule got used.
The following figure shows the visualization of security group rules in QuickSight. In this example, we’ve filtered the flow log data for the duration of a month. The screenshot shows the heat map representation of security group rules based on flow direction or usage count. The color density increases based on usage count. In the following example, we observe the ingress rule ID (sgr-0a2c6a2a1b919ed46) with usage count of 244 has a color density that differentiates it from other security group rules, such as sgr-04759165217dbcc3b (Ingress) and sgr-01e3d80c29348220d (Egress).
Our solution deployment is comprised of a Step Function (with template definition of five Glue jobs) and a QuickSight visualization that consist of components such as Dataset, Analyses, and Dashboard.
The code for the solution deployment can be found in the GitHub repository.
- Activate QuickSight Standard or Enterprise edition. Once activated, you must get the QuickSight User ARN so that the necessary permissions are granted for the dashboard. To get the QuickSight User ARN, make sure that you have activated the Quicksight user account and then run the following command in your AWS CloudShell after replacing <your account id> with AWS account ID and <your region> with the region where the QuickSight user is created.
aws quicksight list-users —aws-account-id <your account id> —namespace default —region <your region>
- Create Athena–DynamoDB connector by following the steps listed in the DynamoDB Connector section of the AWS Athena workshop. This will be used for the QuickSight datasource and add the Rules usage DynamoDB table as the data source.
Deploying the CloudFormation Stack
- Log in to your AWS account. Make sure that you have the right permissions to execute these steps.
- Create an S3 bucket with a unique name to store the code files for the Glue jobs (BUCKET_NAME). This should have respective folders for scripts and libraries (as shown in the following figure).
- Upload all the scripts from scripts folder in repository and the libraries (AWS Data Wrangler and Boto3) to the bucket on the respective folders.
- Navigate to AWS CloudFormation. Launch a CloudFormation stack using the sg_rule_analysis.yaml template from the GitHub repository.
- Provide a unique stack name for SecurityGroupsRulesAnalysisStack-01.
- Provide input parameter values for quicksightUserArn (from Step 1 under the Prerequisites section above), librariesLocation, and scriptsLocation.
- Add the tags Name=SecurityGroupsRulesAnalysis-Stack and Purpose=SecurityGroupsRulesAnalysis. Then select Next.
- Review the Stack parameters, select the necessary AWS Identity and Access Management (IAM) role, acknowledge for creating the resources, and select Create Stack.
Visualizing in QuickSight
- Access the QuickSight service from the AWS Console.
- Begin by creating a new dataset. Choose Datasets from the navigation pane at left, and then choose New dataset.
- To create a new Athena connection profile, use the following steps:
- In the FROM NEW DATA SOURCES section, choose the Athena data source card.
- For Data source name, enter a descriptive name.
- For Athena workgroup, choose your workgroup (primary).
- Choose Validate connection to test the connection. It should say “Validated” along with the “SSL is enabled” message.
- Choose Create data source.
- Now select the catalog that was created for the DynamoDB connector in the prerequisites steps.
- Select the DynamoDB table (sg-analysis-rules-usage) to create the visualization. This would have been created as part of the Step Function.
- Choose between the options if you want to cache the data via Quicksight’s Super-fast, Parallel, In-memory Calculation Engine (SPICE) or directly query the data. Directly querying will give you the ability to see real-time data in DynamoDB, but the performance and cost may be sub-optimal since it isn’t cached. Select Visualize.
- Select heatmap visual type, select sg_id as Row dimension, used_times as Value, and flow_direction as Column dimension. This will create the visualization with the data from the rules usage table.
- Add additional filters to visualize rules based on usage, protocol, or flow direction. For example, two filters have been added in the following to view “ingress” rules that were used less than 500 times.
Follow these instructions to clean up the provisioned resources. Leaving resources that you no longer need in your AWS account may incur extra charges. Performing these steps will delete the Serverless infrastructure that was created as part of deploying the solution discussed in this post.
aws cloudformation delete-stack --stack-name SecurityGroupsRulesAnalysisStack-01
You can also delete the stack using the CloudFormation Console:
- Open the CloudFormation console.
- On the Stacks page in the CloudFormation console, select the stack (SecurityGroupsRulesAnalysisStack-01) to delete.
- In the stack details pane, choose Delete.
- Select Delete stack when prompted.
In this post, we’ve shown how to determine un-used security group rules present in an AWS account leveraging a serverless architecture based on Step Functions and AWS Glue Jobs. The solution provided visualizations of the usage metric in QuickSight and also sends out automated email notifications on un-used security group rules in an account. We removed the undifferentiated heavy lifting for setting up infrastructure by providing CloudFormation templates.