How to auto-remediate internet accessible ports with AWS Config and AWS Systems Manager

With the AWS Config service, you can assess, audit, and evaluate the configuration of your Amazon Web Services (AWS) resources. AWS Config continuously monitors and records your AWS resource configurations changes, and enables you to automate the evaluation of those recordings against desired configurations. Not only can AWS Config monitor and detect deviations from desired configurations, but it can also be used together with other services, such as AWS Systems Manager, to automatically remediate such deviations when they are detected. These remediation actions are declared in Systems Manager automation documents, which are invoked by AWS Config when a resource is found to be noncompliant. This turns AWS Config into not only a detection mechanism, but also a near-real-time automated response one.

AWS recommends that you use our Well Architected best practices for site reliability engineering (SRE) and DevOps—specifically, that you implement the principle of least privilege and restrict network access to only necessary IP addresses and ports. But we’re also pragmatic and understand that there are many use cases where customers have a need to open additional ports or sources on Security Groups to troubleshoot issues. This may result in insecure configurations that deviate from your desired or expected configuration. The use case in this blog post covers a real-life example on how to document and manage the desired or expected configuration of your AWS resources with tags, as well as how to use AWS Config to assess the compliance of your configuration against your organization’s defined requirements by leveraging these tags. For example, you can document the desired configuration for your AWS resources within a tag for each resource, and you can use AWS Config to detect and remediate any inconsistencies between the tag documentation and the current configuration.

Solution overview

A best practice is to automate deployment of resources with code. As part of the automated deployment of an Amazon Elastic Compute Cloud (Amazon EC2) instance, you can assign tags to your instance. These tags can be used to indicate what ports should be open at the host level, which in turn would define what the expected configuration for that instance is. If the ports are then changed to not match the expected configuration defined within the tags, then that would indicate the EC2 instance is in a state of non-compliance. As part of this solution, you will develop an AWS Config custom rule to detect ports that aren’t expected to be open in security groups attached to Amazon EC2 instances, and remediate by isolating that security group and removing the noncompliant ports.

There are managed AWS Config rules that accomplish similar tasks, such as vpc-sg-open-only-to-authorized-ports. However, the AWS Config custom rule described in this post is used to:

Demonstrate how you can use custom logic to create your own AWS Config custom rules.
Target EC2 instances, as opposed to security groups, because it’s likely your EC2 instance has more context than an individual security group, which allows for better tagging. For example, if your EC2 instance is running a web application and exposes port 443 to the internet, you can put the corresponding tag to indicate this fact directly on the EC2 instance, and then any security group attached to the instance must follow this expectation.

The remediation action will perform these actions:

If the EC2 instance has more than one security group attached, the remediation will simply detach the security group with the noncompliant rules, while keeping the compliant security groups attached.
If the EC2 instance has a single security group attached, the remediation will quarantine the security group by creating a clone of the security group (prefixed with the string QUARANTINED), and removing any noncompliant rules or ports. The original security group will be detached, and the quarantined security group will be attached.

The solution in this post encompasses three out of the four components of the AWS Cloud Adoption Framework (CAF) Security Perspective:

Directive: Managing governance and compliance definitions within the AWS resource configuration by using tagging.
Detective: Identifying noncompliant EC2 instances with ports accessible to the internet that should not be accessible, by using AWS Config.
Responsive: Automatically remediating, through Systems Manager Automation, security groups attached to EC2 instances that are deemed noncompliant by the AWS Config rule.

Create the AWS Config rule

To simplify the process of creating a custom AWS Config rule, it is highly recommended that you use the AWS Config Rule Development Kit (RDK). By using the kit, you can focus solely on the logic of your AWS Config rule, while the created template handles the rest.

In this example, we made the following assumptions when we built the provided code (if you don’t have these items already configured, you can create and configure them as you go through each section of the procedure):

You have EC2 instances running with attached security groups
Tagging is implemented to indicate which ports open to the internet are accessible, and the tagging follows a format such as Key: AllowedPorts; Value: 80 or Key: AllowedPorts; Value: 80, 443, 8080
Noncompliant security groups are in place to demonstrate the remediation action

The code for this blog post can be found in the following GitHub repository. In particular, the repository contains the code for the custom AWS Config rule, Systems Manager document, and AWS Identity and Access Management (IAM) policy documents that will be used throughout the blog post.

To download the code used for the solution

Clone the repository (or select the Download ZIP option) by using the following command.

git clone https://github.com/aws-samples/aws-blog-security-group-ingress-remediation.git

Open the repository in the text editor of your choice. The folder will be in the same directory from which you cloned the repository. You won’t need to make any modifications to the files to follow this blog, but the following steps in this section will go over some of the important things that are happening in the code.

The code for the custom AWS Config rule is in the SECURITY_GROUP_INGRESS_REMEDIATION folder. First, open the parameters.json file in the SECURITY_GROUP_INGRESS_REMEDIATION subdirectory:

{
  "Version": "1.0",
  "Parameters": {
    "RuleName": "SECURITY_GROUP_INGRESS_REMEDIATION",
    "SourceRuntime": "python3.7",
    "CodeKey": "SECURITY_GROUP_INGRESS_REMEDIATION.zip",
    "InputParameters": "{}",
    "OptionalParameters": "{}",
    "SourceEvents": "AWS::EC2::Instance"
  },
  "Tags": "[]"
}

This file contains information that the RDK will use when it deploys the AWS Config rule. In particular, you’ll see the AWS Config rule name, the runtime, the code location, parameters for the AWS Config rule, and the source event that triggers the AWS Config rule (this can be based on a configuration change, or happen on a scheduled basis).
In this case, you’ll notice that SourceEvents is set to AWS::EC2::Instance, which means that this AWS Config rule will run when an EC2 instance in your environment changes. This is important, because you want to make sure this rule acts similarly to a preventative control, which will automatically remediate noncompliant open ports whenever the instance configuration changes. Although you could opt for a periodic SourceEvent, such as every 24 hours, this would mean that noncompliant open ports could be open for up to 24 hours. Depending on your risk appetite, this delay may be acceptable, but for this use case with public Internet exposure, you may want to remediate these open ports as quickly as possible.
Next, look at the main code for this AWS Config rule, in the SECURITY_GROUP_INGRESS_REMEDIATION.py file, and navigate to the evaluate_compliance function. You will see the comment Add your custom logic here, which is where the majority of the logic is located for this AWS Config rule.

Almost everything else in the file is provided by the generated RDK template. The RDK template takes care of much of the complexity of developing new AWS Config rules, and you can immediately see difference between the lines of code modified by you to develop the rule logic, compared to the entirety of the code in the file. Feel free to browse through this function to understand how validation is performed to determine whether a port is open to the internet, and if the open port is acceptable based on the instance’s AllowedPorts tag.

Next, you’ll deploy this AWS Config rule to your environment with RDK. Again, it’s recommended to take a look at the code to 1) understand how the compliance is being determined and 2) validate that the code is safe to run before deployment – not all code shared on the internet is safe! Be sure to follow the preceding RDK link to properly install and initialize RDK in your environment.

To deploy the AWS Config rule

Run the following RDK command.

rdk deploy SECURITY_GROUP_INGRESS_REMEDIATION

After the code is deployed, sign in to the AWS Management Console and navigate to AWS Config.
Navigate to rules, and you should see the AWS Config rule SECURITY_GROUP_INGRESS_REMEDIATION. After the rule is deployed, it will automatically trigger an evaluation, and you should see evaluation results shortly after deployment.
If you have any noncompliant resources, you should see them appear. If not, you can create a noncompliant security group by adding in an internet-accessible port.
1. Create a new security group and add an inbound rule with the Source as Anywhere or 0.0.0.0/0 with an open port of your choosing.
  
  Figure 1: Configure a noncompliant security group
2. After the security group is created, attach that security group to a running EC2 instance. Within a few minutes, the change you made should trigger the AWS Config rule you deployed and you’ll see the EC2 instance as noncompliant. Alternatively, you can navigate back to the AWS Config rule in the console and choose Re-evaluate manually, which will re-evaluate all your resources (in this case, EC2 instances).

That’s it for this section—you’ve successfully deployed a custom AWS Config rule for detecting security groups with unexpected internet-accessible ports. In the next section, you’ll learn how to automatically remediate these noncompliant deviations to add to this level of detection.

Create a remediation action

As previously mentioned, AWS Config remediation actions are declared in Systems Manager automation documents, which are invoked for identified noncompliant resources. This means that in order to create a custom remediation action, you’ll need to create a custom Systems Manger automation document. As in the previous steps, you’ll use Python as the language of choice for the remediation action in this blog post.

In the cloned GitHub repository, you’ll find the SECURITY_GROUP_INGRESS_REMEDIATION.yaml file, which contains the template used for the automation document. Automation documents are written in JSON or YAML; however, the document can easily be generated by using the Systems Manager Automation console to create your remediation action. For the purposes of following along in this post, you’ll simply use this file and the AWS Command Line Interface (CLI) to create your document.

To create the remediation action and accompanying infrastructure

In the directory with the YAML remediation document, run the following command to create the automation document.

aws ssm create-document --content file://SECURITY_GROUP_INGRESS_REMEDIATION.yaml --name "security-group-ingress-remediation-quarantine" --document-type "Automation" --document-format YAML

After the document is created, you can view it in your Systems Manager Documents in the console, or by running the following CLI command.
```
aws ssm list-documents --filters Key=Owner,Values=Self
```
Before you move on to setting up the AWS Config rule to use this document, you need to give Systems Manager the proper IAM permissions to be able to run the commands specified in the document. You should see the file security-group-ingress-remediation-quarantine-policy.json in the cloned GitHub repository. That policy document contains the necessary permissions. To create a new policy with those permissions, run the following command.
```
aws iam create-policy --policy-name security-group-ingress-remediation-quarantine-policy --policy-document file://security-group-ingress-remediation-quarantine-policy.json
```
Take note of the Arn key in the response, because you’ll need this value in step 7 below.
You also need to create an IAM role and assign the newly created policy to it. You can do that by running the following command (note that the required trust policy document is also provided, named security-group-ingress-remediation-quarantine-trust-policy.json).
```
aws iam create-role --role-name security-group-ingress-remediation-quarantine-role --assume-role-policy-document file://security-group-ingress-remediation-quarantine-trust-policy.json
```
Take note of the Arn key in the response, because you’ll need it in step 5 of the next section.
Lastly, assign the created policy to your newly created role (replace {POLICY_ARN} with the policy Arn value you noted in step 4):
```
aws iam attach-role-policy --role-name 'security-group-ingress-remediation-quarantine-role' --policy-arn {POLICY_ARN}
```

You’ve set up all the infrastructure you need! Moving on to the next section, you’ll configure your AWS Config rule to auto-remediate by using your newly created Systems Manager automation document.

Configure the AWS Config rule to automatically remediate

You need to associate the automation document with your AWS Config rule and configure auto-remediation. This will cause any noncompliant resources to be automatically remediated after they’re identified as noncompliant.

To configure the rule for auto-remediation

In the AWS Config console, navigate to your AWS Config rules and select the recently created rule.
At the top right, choose Edit, and scroll down to Choose remediation action.
In the Remediation action field, select the recently created Systems Manager automation document. Be sure to turn on Auto remediation.
Set the Resource ID parameter to ResourceId to indicate to the Systems Manager document which parameter is the noncompliant resource ID.
Lastly, set the ConfigRuleName parameter to the name of the AWS Config rule (SECURITY_GROUP_INGRESS_REMEDIATION) and set the AutomationAssumeRole parameter to the Arn you took note of the role creation response, from step 6 in the above section.
Your configuration should look like the one in Figure 2.

Figure 2: AWS Config rule remediation configuration
Save your configuration, and you should be all set!

Put everything together

Let’s test the use cases our solution attempts to address.

EC2 instance with multiple security groups attached

The first use case involves an EC2 instance with multiple security groups attached.

To test the remediation for the first use case

In the Amazon EC2 console, create an EC2 instance (if you don’t already have one) and attach a compliant security group with no ports open to the internet.
Attach a new security group to the instance. Feel free to use the noncompliant security group you created in the Create the AWS Config Rule step.
Wait for the configuration change to trigger the AWS Config rule. After the rule runs, the remediation action should also run. The Action status field for the resource in the AWS Config rule console view will show you when the remediation action has run.

Figure 3: Noncompliant EC2 instance detected
Navigate to the EC2 instance within the Amazon EC2 console to view the instance’s attached security groups. You should notice that the noncompliant security group has been detached from the instance.

EC2 instance with a single noncompliant security group attached

The second use case involves an EC2 instance with a single noncompliant security group attached.

To test the remediation for the second use case

In the Amazon EC2 console, create an EC2 instance (if you don’t already have one) and attach a noncompliant security group with ports open to the internet. Feel free to use the noncompliant security group you created in the Create the AWS Config Rule step. It’s recommended to add multiple rules to the security group (some compliant and some noncompliant) to see the behavior of the quarantine functionality.
Wait for the configuration change to trigger the AWS Config rule. After the rule runs, the remediation action should also run.
Navigate to the EC2 instance within the Amazon EC2 console to view the instance’s attached security groups. You should notice that the EC2 instance has been quarantined by adding a new security group prefixed with the string QUARANTINED. The noncompliant rules within the security group should be removed, and only the allowed rules remain.

Figure 4: EC2 instance remediated with quarantined security group

EC2 instance with allowed open ports

The third use case involves an EC2 instance with allowed open ports.

To test the remediation for the third use case

In the Amazon EC2 console, create an EC2 instance (if you don’t already have one) and attach a compliant security group with no ports open to the internet.
On the Tags tab for the instance, create a tag on the instance with the key AllowedPorts and a value of 8443.

Figure 5: A tag attached to the EC2 instance that indicates an exception
Attach a new security group to the instance, with a rule allowing port 8443 to be open to 0.0.0.0/0.
Wait for the configuration change to trigger the AWS Config rule. After the rule runs, you should notice that the instance is marked as compliant, since the port is allowed even though it’s internet-accessible.

Figure 6: The EC2 instance with a defined tag-based exception is appropriately marked as compliant

Summary

Congratulations on getting this far! We hope you’ve now learned about using AWS Config not only as a detective mechanism, but also a quick responsive one. The solution we demonstrated in this post solved a real-world problem that impacts the security of many organizations. We hope this illustrated how you can use AWS Config, Systems Manager automation documents, and configuration tags as a scalable option. We also hope you learned a bit more about how you can develop custom AWS Config rules and remediation actions using the RDK. Additionally, you’ve successfully deployed the following:

A custom AWS Config rule, created with the RDK, that identifies unallowed internet-accessible ports that are attached to EC2 instances.
A remediation action to quarantine noncompliant security groups or Amazon EC2 instances and remove unallowed internet-accessible ports.
The underlying infrastructure that is required to support the preceding items (the Lambda functions, IAM policies, IAM roles, and so on).

Taking this a step further, we recommend that you consider doing the following to more efficiently and effectively deploy your solution at scale:

Deploy the AWS Config rule and associated remediation action as a conformance pack.
Define the infrastructure as code by using the AWS Cloud Development Kit (AWS CDK).
Identify open ports by using the Amazon Inspector Network Reachability package.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the discussion forum for AWS Config or AWS Systems Manager or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

AWS Security Blog