Containers

Using OPA to validate Amazon EKS Blueprint Templates

As organizations adopt containerization technologies, such as Kubernetes, the challenge of making sure of security and compliance becomes increasingly complex. With Kubernetes environments that span Infrastructure-as-Code (IaC) and Kubernetes clusters, maintaining a secure posture can be a daunting task. To tackle this complexity and reduce risk, many teams are turning to standardized tooling.

Organizations that run Kubernetes-based workloads have discovered the power of admission web-hooks as a key component of their security strategy. Admission web-hooks, when paired with tools such as Open Policy Agent (OPA), open up a realm of possibilities for enhancing security and compliance across the entire technology stack. OPA is an open-source, general-purpose policy engine, and it is a versatile tool that enables policy enforcement not only within Kubernetes but also at the IaC level. This allows organization to enforce policies on various resource types. To learn more about OPA, visit the GitHub repository.

In this post, we explore the benefits of using OPA to scan your Amazon EKS Blueprints for Terraform as code, and how it can help you maintain a secure and compliant environment.

Solution overview

The Amazon EKS Blueprints for Terraform contains patterns that can be used by AWS users, partners, and internal AWS teams. EKS Blueprints assist with configuring and managing complete Amazon Elastic Kubernetes Service (Amazon EKS) clusters that are fully bootstrapped with the operational software that is needed to deploy and operate workloads. EKS Blueprints allow users to adopt best practices and start onboarding workloads in days rather than months. Although organizations allow teams to move rapidly, they also want to provide guardrails to make sure that the resources are configured in accordance with best practices.

These guardrails can be adopted to audit running configuration of resources, as well as to test resources prior to deployment. By implementing compliance checks as part of your IaC pipeline, you can attempt to catch compliance violations early in the development process and prevent them from being deployed to production. This can save you time and resources in fixing issues downstream and reducing the likelihood of non-compliance issues arising during audits or inspections.

We are reviewing how to use OPA to evaluate your Terraform plan files for compliance (see Diagram 1). This process can help make sure of the compliance of the configuration of your resource prior to deployment. The OPA guardrails are written in the policy language known as Rego. Some of the checks we are reviewing as part of this post are:

  • Validate the Cluster has security group defined
  • Validate the Cluster being configured as Private
  • Validate Worker node Disk Size

 

 

Diagram 1 OPA request/response workflow

Diagram 1 OPA Request/Response Workflow

Prerequisites

Before proceeding, you should have the following prerequisites:

Walkthrough

The following steps walk you through this post.

Clone code repository locally

We use two code repositories together to deploy and manage EKS clusters while enforcing policies and best practices defined in the OPA rules.

git clone https://github.com/aws-ia/terraform-aws-eks-blueprints.git

This repository contains a collection of EKS cluster patterns implemented in Terraform code for deploying and managing EKS clusters and related AWS resources

git clone https://github.com/aws-samples/aws-infra-policy-as-code-with-terraform.git

This repository contains OPA policies to test Amazon EKS infrastructure against Terraform plan.

Create Terraform plan

To begin to evaluate our IaC, we must initialize the Terraform. Once initialized, we run the terraform plan command, which creates an execution plan. This lets you preview the changes that Terraform plans to make to your infrastructure. Once complete, we review the output of ‘tfplan.json’.

terraform -chdir=terraform-aws-eks-blueprints/patterns/fargate-serverless/ init
terraform -chdir=terraform-aws-eks-blueprints/patterns/fargate-serverless/ plan --out tfplan.binary
terraform -chdir=terraform-aws-eks-blueprints/patterns/fargate-serverless/ show -json tfplan.binary > tfplan.json

cat tfplan.json 

Now that we have our plan, we can begin to evaluate it with OPA

Evaluate Cluster Security Group

The first OPA rule that evaluates against our Terraform plan is for the security group of the cluster. This rule validates the presence of the security group of the cluster. To do so, we use the Opa eval command, which evaluates Rego expressions and policies.

opa eval -i tfplan.json \
-d aws-infra-policy-as-code-with-terraform/policy-as-code/OPA/policy/aws/eks/aws-eks-m-3.rego \
-d aws-infra-policy-as-code-with-terraform/policy-as-code/OPA/policy/common.utils.rego "data.aws.eks.m3.deny"

The -i flag sets the input Terraform plan document for the evaluation, and the -d sets the OPA policies. This command should output the following JSON:

{
    "result": [
        {
            "expressions": [
                {
                    "value": [
                        "AWS-EKS-M-3: 'module.eks.aws_eks_cluster.this[0]' EKS Cluster Should have cluster security group defined"
                    ],
                    "text": "data.aws.eks.m3.deny",
                    "location": {
                        "row": 1,
                        "col": 1
                   }
               }
             ]
        }
    ]
}

After running this check, we can see there is a value with the message “EKS Cluster Should have cluster security group defined”. This tells us the cluster currently does not have a security group defined.

Let’s review the Rego code that did this evaluation. You can view the content by running the following command:

cat aws-infra-policy-as-code-with-terraform/policy-as-code/OPA/policy/aws/eks/aws-eks-m-3.rego

Output:

is_in_scope(resource) {
  resource.mode == "managed"
  data.utils.is_create_or_update(resource.change.actions)
  resource.type == "aws_eks_cluster"
}

is_security_group_enabled(resource){
  resource.change.after.vpc_config[0].security_group_ids
  count(resource.change.after.vpc_config[0].security_group_ids) > 0
} else {
  resource.change.after_unknown.vpc_config[0].security_group_ids == true
} else = false{
  true
}

deny[reason] {
  resource := input.resource_changes[_]
  is_in_scope(resource)
  not is_security_group_enabled(resource)
  reason := sprintf("AWS-EKS-M-3: '%s' EKS Cluster Should have cluster security group defined", [resource.address])

We invoked the deny function, which evaluates the input Terraform plan document by checking whether the resource mode is managed, the resource type is aws_eks_cluster, and the cluster has a security group defined. In case of any evaluation failures, we return the reason in the deny function describing the reason for the failure.

Let’s change this and re-run our check. Edit the cluster configuration within the following file ‘terraform-aws-eks-blueprints/patterns/fargate-serverless/main.tf’ to enable the creation of the cluster security group.

Run the following command:

sed -i '' 's/create_cluster_security_group = false/create_cluster_security_group = true/g' terraform-aws-eks-blueprints/patterns/fargate-serverless/main.tf

We must create a new Terraform plan file to test the new configuration. Remove the previously created tfplan files.

rm terraform-aws-eks-blueprints/patterns/fargate-serverless/tfplan.binary
rm tfplan.json

terraform -chdir=terraform-aws-eks-blueprints/patterns/fargate-serverless/ init
terraform -chdir=terraform-aws-eks-blueprints/patterns/fargate-serverless/ plan --out tfplan.binary
terraform -chdir=terraform-aws-eks-blueprints/patterns/fargate-serverless/ show -json tfplan.binary > tfplan.json

Now, let’s re-run the definition of cluster security group check:

opa eval -i tfplan.json \
-d aws-infra-policy-as-code-with-terraform/policy-as-code/OPA/policy/aws/eks/aws-eks-m-3.rego \
-d aws-infra-policy-as-code-with-terraform/policy-as-code/OPA/policy/common.utils.rego "data.aws.eks.m3.deny"

The EKS cluster has the security group configured.

Output:

{
  "result": [
    {
      "expressions": [
        {
          "value": [],
          "text": "data.aws.eks.m3.deny",
          "location": {
            "row": 1,
            "col": 1
          }
        }
      ]
    }
  ]
}

Validate the cluster is private

Next, we run a check to validate the EKS cluster is configured to be private. A private cluster allows Kubernetes API calls within your cluster’s VPC (such as node-to-control-plane communication) to use the private VPC endpoint and traffic to remain within your cluster’s VPC

opa eval -i tfplan.json \
-d aws-infra-policy-as-code-with-terraform/policy-as-code/OPA/policy/aws/eks/aws-eks-m-2.rego \
-d aws-infra-policy-as-code-with-terraform/policy-as-code/OPA/policy/common.utils.rego "data.aws.eks.m2.deny"

Output:

{
 "result": [
    {
    "expressions": [
        {
        "value": [
           "AWS-EKS-M-2: 'module.eks.aws_eks_cluster.this[0]' EKS Cluster should have only private endpoints"
        "text": "data.aws.eks.m2.deny",
        "location": {
        "row": 1,
        "col": 1
         }
        }
        ]
     }
  ]
}

After running this check, we can see there is a value with the message “EKS Cluster should have only private endpoints”. This tells us the cluster currently allows traffic to the control plane from the internet. Let’s change this and re-run our check. Edit the cluster configuration within the following file ‘terraform-aws-eks-blueprints/patterns/fargate-serverless/main.tf’ to make the cluster private.

Run the following command:

sed -i '' 's/cluster_endpoint_public_access = true/cluster_endpoint_public_access = false/g' terraform-aws-eks-blueprints/patterns/fargate-serverless/main.tf

We must create a new Terraform plan file to test the new configuration.

rm terraform-aws-eks-blueprints/patterns/fargate-serverless/tfplan.binary
rm tfplan.json

terraform -chdir=terraform-aws-eks-blueprints/patterns/fargate-serverless/ init
terraform -chdir=terraform-aws-eks-blueprints/patterns/fargate-serverless/ plan --out tfplan.binary
terraform -chdir=terraform-aws-eks-blueprints/patterns/fargate-serverless/ show -json tfplan.binary > tfplan.json

Now, let’s re-run the private endpoint cluster check:

opa eval -i tfplan.json \
-d aws-infra-policy-as-code-with-terraform/policy-as-code/OPA/policy/aws/eks/aws-eks-m-2.rego \
-d aws-infra-policy-as-code-with-terraform/policy-as-code/OPA/policy/common.utils.rego "data.aws.eks.m2.deny"

Now the EKS cluster is configured as private.

Output:

{
 "result": [
    {
    "expressions": [
        {
        "value": [],
        "text": "data.aws.eks.m2.deny",
        "location": {
        "row": 1,
        "col": 1
         }
        }
      ]
    }
  ]
}

Validate disk size

For the next rule, we evaluate whether cluster node groups have the disk_size parameter configured. Since we are using the Fargate serverless pattern, this rule should pass.

opa eval -i tfplan.json \
-d aws-infra-policy-as-code-with-terraform/policy-as-code/OPA/policy/aws/eks/aws-eks-r-2.rego \
-d aws-infra-policy-as-code-with-terraform/policy-as-code/OPA/policy/common.utils.rego "data.aws.eks.r2.deny"

Output:

{
  "result": [
    {
      "expressions": [
        {
          "value": [],
          "text": "data.aws.eks.r2.deny",
          "location": {
            "row": 1,
            "col": 1
          }
        }
      ]
    }
  ]
}

This is a technique of scanning an Amazon EKS blueprint to be expanded to include other checks within your environment tagging, Amazon Elastic Compute Cloud (Amazon EC2) instance type, or even other AWS resources such as an Amazon DynamoDB table also being configured as private.

Conclusion

Using OPA to scan your IaC and within your Kubernetes cluster is a smart and effective way to make sure of the security and compliance of your environment. By leveraging OPA, you can enforce policies at scale, reduce the risk of misconfigurations and vulnerabilities, and simplify compliance audits. OPA’s flexibility and compatibility with various tools and platforms make it an excellent choice for organizations seeking to adopt a policy-based approach to cloud infrastructure and containerization. Ultimately, implementing OPA as part of your security and compliance strategy can help you better protect your organization’s assets and reputation in the face of ever-evolving cyber threats.

This post demonstrates that by using Amazon EKS Blueprints and OPA Rego policies you can automate Kubernetes preventative controls using IaC. As a result, the Kubernetes clusters can be deployed into the environment only if they are compliant with all control policies required by the enterprise. And with the power of AWS services, we can codify and automate the whole validation and deployment process end-to-end, which fast-tracks your adoption of Amazon EKS and deployment of Kubernetes workloads while remaining compliant with enterprise control policies.

OPA is not the only way that this can be accomplished, and some AWS users have other tooling in place they’ve already adopted. If you use Terraform Cloud and Sentinel Policies, this post shows you step-by-step how to accomplish similar outcomes.

Piyush Mattoo

Piyush Mattoo

Piyush Mattoo is a Solution Architect for enterprises at Amazon Web Services. He is a software technology leader with over 15 years of experience building scalable and distributed software systems that require a combination of broad T-shaped skills across multiple technologies. He has an educational background in Computer Science with a Masters degree in Computer and Information Science from University of Massachusetts.

Hans Nesbitt

Hans Nesbitt

Hans Nesbitt is a Senior Solutions Architect at AWS based out of Southern California. He works with customers across the western US to craft highly scalable, flexible, and resilient cloud architectures.