Containers

Compliance as Code for Amazon ECS using Open Policy Agent, Amazon EventBridge, and AWS Lambda

Customers are looking for ways to implement best practices/policies that enforce security and ongoing compliance. These best practices apply to workloads running on Amazon Elastic Container Service (Amazon ECS). Nowadays, policies can be expressed as code and evaluated before workloads are deployed. This enables you to consistently enforce best practices and prevent workloads that violate those policies from being deployed.

Open Policy Agent (OPA), is a general-purpose policy engine that enables unified, context-aware policy enforcement across the entire stack. It is also a graduated Cloud Native Computing Foundation (CNCF) project.

Container images are stored in an registries. Customers want assurances that the images they use come from trusted sources and registries. Benchmarks and guides such as the Center for Internet Security (CIS) Benchmark for Amazon Elastic Kubernetes Service (Amazon EKS), and the Application Container Security Guide from the National Institute of Standards and Technology Special Publication 800-190 (NIST 800-190) recommend the use of trusted or approved container image registries. It is not uncommon for customers to want to configure their environments in accordance with these guides.

This blog will illustrate how to implement a policy that stops images from unauthorized registries from running within an Amazon ECS cluster, using compliance as code practices. This can serve as the reference architecture for you to implement other policies within your organization.

Solution overview

Components and themes:

  • Amazon ECS sends events to Amazon EventBridge for certain event types, such as container instance state change events, task state change events, and service action events. When these resources change, an event is triggered. The solution uses the task state change events generated.
  • OPA provides a declarative language (Rego) to define. Rego will be used to define the use of approved/trusted container image registry policy.
  • OPA includes a mechanism to integrate with customer workflows. We will be using this as part of the solution.
  • The policy will apply to all containers in the Amazon ECS task.

Figure 1: An event-driven approach to implementing Compliance as Code for Amazon ECS

Figure 1 is an event-driven architecture for implementing Compliance as Code practices for Amazon ECS using OPA:

  1. Amazon ECS task state change event is triggered, e.g. when there is a request to run a task.
  2. Amazon EventBridge is configured to receive these events and triggers an AWS Lambda function in response.
  3. The AWS Lambda function loads the policy from the AWS Systems Manager Parameter Store. OPA is then used to evaluate the event payload and arrives at a decision.
  4. If the event payload violates the policy, corrective action can be taken on the Amazon ECS cluster. For instance, the AWS Lambda function can be provided with appropriate execution role to stop the running the tasks if the container image(s) are not sourced from approved registries.
  5. The AWS Lambda function leverages Amazon Simple Notification Service (Amazon SNS) to send notifications to a DevSecOps focused team. The notification can be sent to a Security Information and Event Management (SIEM) tool such as Security Hub (Custom integration feature) for further follow-up/action.

Prerequisites

  1. An AWS account.
  2. AWS CLI configured with the appropriate permissions required for deploying AWS CloudFormation stacks. Follow these instructions to install AWS CLI.
  3. Follow these instructions to install the AWS Serverless Application Model (AWS SAM) CLI.
  4. Two (2) Amazon ECS clusters with a service configured for Auto Scaling with a desired count of one. This will help you replicate the steps for verification, as described in the corresponding section of this blog.
    1. The first cluster should have the service with container image(s) hosted on Amazon ECR. This example can be used as reference.
    2. The second cluster should have a service with container image(s) not hosted on Amazon ECR, Docker Hub for instance.

Implementation

Step 1: Clone the GitHub repository that supports this blog.

Clone the GitHub repository at amazon-ecs-compliance-as-code-opa. This repository contains the AWS SAM project, that will be used to deploy the solution. The following will clone the main branch to your machine.

git clone https://github.com/aws-samples/amazon-ecs-compliance-as-code-opa.git

Step 2: Build the project

In the environment where you can invoke the AWS SAM CLI commands, go to the amazon-ecs-compliance-as-code-opa directory. Run the following command to build the project.

$sam build

Figure 2 shows a sample output of running the build command.

Figure 2. Sample output of SAM build command

Step 3:  Deploy the project

To deploy the solution as an AWS CloudFormation stack, run the following deploy command.

$sam deploy --guided

The command will provide a series of prompts. You may choose to use the defaults presented in brackets by hitting the Enter key. Note that some of the prompts will require choosing/providing the appropriate value to proceed:

  • Stack name: The name of the stack to deploy to CloudFormation. This should be unique to your account and Region, and a good starting point would be something matching your project name.
  • AWS Region: The AWS Region you want to deploy the solution to, e.g. us-east-1
  • Confirm changes before deploy: If set to yes (Y), any change sets will be shown to you before execution for manual review. If set to no (N), the AWS SAM CLI will automatically deploy the changes. Suggested input is N to deploy automatically. If you enter Y, you will need to provide additional input to proceed.
  • Allow SAM CLI IAM role creation: AWS SAM needs permission to be able to create roles to connect to the resources in your template. Please provide permission by choosing ‘Y’ (Yes).
  • Choose the defaults for any remaining prompts.

This will deploy the solution.

Figure 3 shows a sample of the deploy command being run and the prompts presented.

Figure 3: sam deploy command

Verification

An Amazon SNS topic is created as part of the CloudFormation stack. There are various options to subscribe to the SNS endpoint. Please see instructions for details on how to subscribe to an Amazon SNS topic. Figure 4, shows a sample subscription to the topic using the email protocol.

Figure 4: SNS topic subscription

You should have 2 Amazon ECS clusters running a service with a Desired Count value of 1. Figures 5 and 6 show the two clusters, with the Desired Count value. The Running Count is the same confirming that a single task is running in both clusters.

Figure 5: ECS cluster running a service with Amazon ECR as the container image registry

Figure 6: ECS cluster running a service without Amazon ECR as the container image registry

The solution sets up an Amazon EventBridge rule that triggers a Lambda function when a request is made to start a task.

Figure 7 shows the rule and the associated trigger. Please refer to the ‘Task state change events’ section of the documentation for more details on these types of events.

Figure 7: Amazon EventBridge rule to trigger the Lambda function

Triggering the Lambda function is achieved by initiating a task state change event on both clusters. Stopping the task running on both clusters, will cause the Running Count value on the service on each cluster to be set to 0. Since the Desired Count value is still 1, it causes an auto scaling event, which starts new tasks on both clusters.

Figure 8: Container image sourced from ECR, thus compliant

Figure 9: Container image not sourced from ECR, thus non-compliant

Figures 8 and 9 show the results of the OPA policy execution on the task state change events for the clusters using container images from ECR and non-ECR, respectively. The output is used in this solution to determine if the cluster is or isn’t compliant. OPA provides a portal, as shown above, to build and test your own policies against sample input.

Figure 10: Stopping the task on the ECS cluster with images sourced from Amazon ECR

Figure 11: Stop task confirmation dialog

Figure 12: Stopping task on the ECS cluster with images not sourced from Amazon ECR

Figure 13: Stop task confirmation dialog

Figures 10, 11, 12, and 13 show the tasks on each cluster being stopped using the AWS Management Console.

If the containers in the task violate the OPA policy, the Lambda function will scale the offending service to zero and deregister the task definition to prevent it from being used by other services.

Figure 14: “Non-compliant” task stopped and task definition inactivated

Figure 14 shows the corrective actions taken on the cluster that is found to be “Non-compliant”. The Task that was started, was stopped by the Lambda function based on the OPA policy execution decision. The Task definition was de-registered (rendered inactive) as well. In addition, an email was generated by the solution using the SNS topic subscription, with details about the cluster, service, and task. Figure 15 shows one sample email notification.

Figure 15: Email alert via SNS topic subscription

Figure 16: “Compliant” task and task definition is not affected

Figure 17: “Compliant” task allowed to run successfully

Figures 16 and 17 show that the “Compliant” task, was not affected and the task that was initiated by Auto Scaling to be started, was running successfully. This completes the verification process of the solution.

Cleanup

Step 1: Delete the CloudFormation stack created by step 3 of the implementation instructions. This can be done using the AWS Management console or the AWS CLI.

Step 2: Delete the two Amazon ECS clusters that were created as part of the blog prerequisites.

Conclusion

In this post, I presented an architecture and implementation, that will enable you to implement Compliance as Code practices in your organization, for workloads running on Amazon ECS using OPA. Using an approved container image registry compliance check for Amazon ECS workloads, is the scenario shown here. The Rego policy is stored in the AWS Systems Manager Parameter Store and loaded by the Lambda function, when triggered. This abstraction enables different policies to be loaded in the Parameter Store and executed to derive a decision. Hence, the solution can be extended to implement other compliance checks to meet the needs of your organization.

Thank you for reading this blog post. I hope that the information in this post helps you get started in your own Compliance as Code for Amazon ECS journey. If you have feedback about this blog post, please submit comments in the Comments section below.