Implementing Runtime security in Amazon EKS using CNCF Falco

Many organisations are in the process of migrating their applications to containers. Containers provide application-level dependency management, speedy launches, and support immutability. This can help reduce costs, increase velocity, and improve on efficiency. For securely managing the container lifecycle, container image hardening, and end-to-end security checks are critical factors. Containers need to be secured by default before the containers are deployed into a container orchestrator, such as Amazon Elastic Kubernetes Service (Amazon EKS). The journey of hardening containers begins as follows:

Lint your Dockerfile.
Build the image with the linted Dockerfile or Docker Compose file.
Perform static container image scanning.
Verify the vulnerabilities.
Have a manual approval process.
Deploy to the orchestrator, Amazon ECS or Amazon EKS.
Enable dynamic image scanning on Containers and analyse the logs regularly.

Let’s first cover what is a static scan and a dynamic scan to better understand the pipeline flow:

Static scan is a type of deep scanning of the container layers before they are used or deployed. The container is scanned against the public bug or CVE databases.
Dynamic scan is a type of deep scanning of the container layers after or while they are running or deployed. This methodology can scan and publish the results as required or analyze the logs continuously while the container is running. There are multiple products available in the market that fall under dynamic scanning tools, such as CNCF Falco, Twistlock, and Aqua.

To learn more about this topic, check out this post on Container DevSecOps on Amazon ECS Fargate with AWS CodePipeline. In this post, we show you how you can build, install, and use runtime security with CNCF Falco on Amazon EKS. The demo utilizes a static scan methodology and performs a deep container scan for any vulnerabilities or issues and finally deploys them to Amazon EKS.

We will be using the following AWS services and open source tools for this post:

Set up your Amazon EKS cluster

Before we set up an Amazon EKS cluster, please set up the tools mentioned below on your system. Detailed information on how to setup with respect to operating system types has been provided in the links:

Create a sample (check below) Amazon EKS cluster configuration file called cluster-config.yaml. This configuration file will be used as configuration file to deploy the Amazon EKS cluster. We can deploy the Amazon EKS cluster to an existing or new VPC. I have used an existing VPC to set up the cluster with managed nodes groups in both public and private subnets.

You can go to eksctl.io page to get many examples of ClusterConfig samples. Below is one of the configuration files you can use. In this demo, we show how to build the cluster with pre-existing resources. Visit the eksctl docs to understand the ClusterConfig schema elements.

cluster-config.yaml file:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: eks-managed-cluster
  region: ap-south-1
vpc:
  id: "vpc-xxxxxxxxxx" # Provide the VPC ID 
  cidr: "xxxxxxxxxxxx" # Provide the VPC CIDR Range
  subnets:
    public:
      ap-south-1a:
          id: "subnet-xxxxxxxx" # Provide the Subnet ID
          cidr: "xxxxxxxxxxxx" # Provide the Subnet CIDR Range
      ap-south-1b:
          id: "subnet-xxxxxxxx" # Provide the Subnet ID
          cidr: "xxxxxxxxxxxx" # Provide the Subnet CIDR Range          
# Provide the service role for EKS cluster         
#iam:
#  serviceRoleARN: "arn:aws:iam::11111:role/eks-base-service-role"
# Below schema elements build Non-EKS managed node groups
#nodeGroups:
#  - name: ng-1
#    instanceType: m5.large
#    desiredCapacity: 3
#    iam:
#      instanceProfileARN: "arn:aws:iam::11111:instance-profile/eks-nodes-base-role"
#      instanceRoleARN: "arn:aws:iam::1111:role/eks-nodes-base-role"
#    privateNetworking: true
#    securityGroups:
#      withShared: true
#      withLocal: true
#      attachIDs: ['sg-xxxxxx', 'sg-xxxxxx']
#    ssh:
#      publicKeyName: 'my-instance-public-key'
#    tags:
#      'environment:basedomain': 'example.org'
# Below schema elements build EKS managed node groups
managedNodeGroups:
- name: eks-managed-ng-1 # Provide the name of the node group
minSize: 1 # Autoscaling Group configuration
maxSize: 2 # Autoscaling Group configuration
instanceType: t2.small # Size and type of the worker nodes
desiredCapacity: 1 # Autoscaling Group configuration
volumeSize: 20 # Worker Node volume size
ssh:
allow: true
# You can use the provided public key to logon to the containers.
publicKeyPath: ~/.ssh/id_rsa.pub
# sourceSecurityGroupIds: ["sg-xxxxxxxxxxx"]. # OPTIONAL
labels: {role: worker}
tags:
nodegroup-role: worker
iam:
withAddonPolicies:
externalDNS: true
certManager: true
# provide the role ARN to be atatched to instances
#    iam:
#      instanceRoleARN: "arn:aws:iam::1111:role/eks-nodes-base-role"

Run the below command to create the Amazon EKS cluster.

eksctl create cluster -f cluster-config.yaml

You can find the cluster created by Amazon CloudFormation as shown below:

You can go to the Amazon EKS page on the AWS Management Console and check the status of the cluster creation as below:

Set up a sample deployment on Amazon EKS Cluster

Create a new configuration called deployment.yaml for your sample application. We will use a sample Nginx website pods on the public subnets that we have provided in the cluster configuration file cluster-config.yaml.

Check the below sample deployment.yaml file.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx2
  labels:
    app: nginx2
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx2
  template:
    metadata:
      labels:
        app: nginx2
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: beta.kubernetes.io/arch
                operator: In
                values:
                - amd64
                - arm64
      containers:
      - name: nginx
        image: nginx:1.19.2
        ports:
        - containerPort: 80

Now deploy Nginx as shown below:

kubectl apply -f deployment.yaml

You can verify the deployment status with kubectl command.

kubectl get deployments --all-namespaces

You should see the following output.

NAMESPACE     NAME     READY    UP-TO-DATE  AVAILABLE AGE
default       nginx2   3/3      3           3         46d
kube-system   coredns  2/2      2           2         46d

Set up a Falco runtime security

We will install the well known runtime security tool CNCF Falco for deep container security events analysis and alerts. Falco works in conjunction other AWS services such as Firelens and Amazon CloudWatch. Firelens is log aggregator product which has the ability to collect and send the container logs to many services like inside Amazon ecosystem like Amazon CloudWatch for further analysis and alerting mechanisms. Firelens uses Fluent Bit or Fluentd behind the scenes and supports all features and configurations of both of the products. We can even send the AWS Firelens logs output to any external logging and analytics services as well.

Amazon CloudWatch is a deep monitoring, alerting and analytics service of Amazon which provides lot of insights on the services from which logs are received from. We can create custom dashboard metrics, alerting and insights on the logs. Please check the documentation here on Amazon CloudWatch. Falco specifically uses Firelens and Amazon CloudWatch as below as explained in the blog :

1. Falco continuously scans the containers running in the pods and sends the security, debug, or audit events as JSON format as STDOUT.

2. Firelens then collects the JSON log file and processes the logs files as per Fluent Bit configuration files.

3. Post log transformation by Fluent Bit containers, the logs are finally sent to AWS CloudWatch as the final destination.

This blog explains in depth on how to install Falco and how to it works with other AWS services.

Clone the Falco repository

Git clone this repository: https://github.com/sysdiglabs/falco-aws-firelens-integration

Now go to the directory, eks/fluent-bit, and you will find two directories called aws and kubernetes

aws – This is the directory that has the IAM policy called iam_role_policy.json, which we will attach to the worker node VM’s role, which is automatically attached to the worker nodes when we create or deploy an EKS cluster. This policy will give Falco running on the worker nodes to send/stream logs to Amazon CloudWatch.

Kubernetes – This directory has three files: configmap.yaml, daemonset.yaml, and service-account.yaml. These files will be applied to create a ConfigMap for Fluent Bit configuration, a Fluent Bit DaemonSet to run on all worker nodes, and finally a service account for the RBAC cluster role for authorization. All the files will be applied all at once.

This Falco blog explains the same on how to install standard Falco. We will attach the IAM policy to the node instances to give them permissions to stream logs to Amazon CloudWatch as below –

Set up the Falco with IAM permissions

aws iam create-policy --policy-name EKS-CloudWatchLogs --policy-document file://./fluent-bit/aws/iam_role_policy.json

This creates a policy called EKS-CloudWatchLogs with privileges to send logs to Amazon CloudWatch.

aws iam attach-role-policy --role-name <EKS-NODE-ROLE-NAME> --policy-arn `aws iam list-policies | jq -r '.[][] | select(.PolicyName == "EKS-CloudWatchLogs") | .Arn'`

NOTE: “EKS-NODE-ROLE-NAME” is the role that is attached to the worker nodes. You can find the role attached. For example in my case after setting up the EKS cluster, I see eksctl-eks-managed-cluster-nodegr-NodeInstanceRole-1T0251NJ7YV04 is the role attached the node.

aws iam attach-role-policy --role-name eksctl-eks-managed-cluster-nodegr-NodeInstanceRole-1T0251NJ7YV04 --policy-arn `aws iam list-policies | jq -r '.[][] | select(.PolicyName == "EKS-CloudWatchLogs") | .Arn'`

Finally, apply the whole directory and all the listed configuration files (Configmap.yaml, daemonset.yaml and service-account.yaml) will be applied.

kubectl apply -f eks/fluent-bit/kubernetes/

Set up the Falco Helm repository

Clone the falcosecurity/falco Helm chart repository as below and add the helm chart.

git clone https://github.com/falcosecurity/charts.git; helm repo add falcosecurity https://falcosecurity.github.io/charts

Helm repo update

Go to falco/rules directory and check the default rules configuration files, which are shipped and can be readily applied. Please go through the default rule-set yaml files, which have detailed explanations of the rules specified in each of them. We can add our custom rules as well.

1. application_rules.yaml
2. falco_rules.local.yaml
3. falco_rules.yaml
4. k8s_audit_rules.yaml

Falco behavior can be controlled by a configuration parameters, which can be supplied as runtime parameters while installing the chart or by creating a special purpose file, for example values.yaml (you can give any name). Check out this page to understand all the configuration parameters, which control the run time behavior of Falco audit level, log level, file outputs etc.

sample values.yaml is here below for your reference.

NOTE: The jsonOutput property is false in values.yaml by default. Set to true for json formatted output via fluent-bit.

https://github.com/falcosecurity/charts/blob/master/falco/values.yaml

Finally install the Helm chart.

helm install falco -f values.yaml falcosecurity/falco

You should see the following output:

NAME: falco

LAST DEPLOYED: Tue Oct 6 12:06:26 2020
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Falco agents are spinning up on each node in your cluster. After a few
seconds, they are going to start monitoring your containers looking for
security issues.

No further action should be required.

Once this deployment is completed, Falco will be scanning our Kubernetes cluster pods for security or suspicious events behavior and sending the log events to Firelens, which transforms the JSON logs as per the configuration settings specified and finally sends the logs to CloudWatch.

At the end, we will have the following number of pods and deployments.

kubectl get pods —all-namespaces

You should see the following output:

NAMESPACE     NAME                      READY STATUS RESTARTS AGE
default       falco-f95fw               1/1   Running 0       10d
default       falco-n7hb8               1/1   Running 0       10d
default       fluentbit-85bsm           1/1   Running 0       10d
default       fluentbit-qjk5n           1/1   Running 0       10d
default       nginx2-7844999d9c-8xgkx   1/1   Running 0       10d
default       nginx2-7844999d9c-fbsqf   1/1   Running 0       10d
default       nginx2-7844999d9c-wdpz8   1/1   Running 0       10d
kube-system   aws-node-86mwx            1/1   Running 0       10d
kube-system   aws-node-qwfs4            1/1   Running 0       10d
kube-system   coredns-56666f95ff-dg75w  1/1   Running 0       10d
kube-system   coredns-56666f95ff-xr8l2  1/1   Running 0       10d
kube-system   kube-proxy-b54kq          1/1   Running 0       10d
kube-system   kube-proxy-kp5wd          1/1   Running 0       10d

In the Console, go to CloudWatch to find the Log group streams.

Simulating and Testing

Falco will catch any suspicious activity on the pods. We can test this by going inside any of the test deployments (in our case its NGINX application pods) and simulate certain commands to trigger default rules implemented by Falco.

Example 1: Simulating rules in falco_rules.yaml : Go inside any of the NGINX pods and execute the following statements, which will trigger the rule called “ write below etc” and “Read sensitive file untrusted”

You can use the above listed command (kubectl get pods) to list all the pods running on your EKS cluster and go inside (SSH) one of the NGINX pods and simulate the below suspicious activities. For example, we can SSH into one NGINX pod nginx2-7844999d9c-wdpz8 and simulate some actions which Falco will catch based on the rule sets we have created.

Falco is enabled to catch all suspicious activities on all the pods on the whole cluster instances.

kubectl exec -it 'nginx2-7844999d9c-wdpz8' /bin/bash

and generate the activity like below:

touch /etc/2
cat /etc/shadow > /dev/null 2>&1

Falco will generate the alerts on CloudWatch as shown below on the test simulation.

Example 2: Go inside any of the Nginx pod and execute below statements, which will trigger the rule called “Mkdir binary dirs.”

kubectl exec -it 'nginx2-7844999d9c-wdpz8' /bin/bash

and generate the activity like below:

cd /bin
mkdir hello

Falco will generate the alerts on CloudWatch as below on the test simulation.

Creating Custom Rules

I will create a YAML file with sample custom rules or append the rules to the existing default rules set. In this demo, I’m going to create a new custom rule set file called custom_alerts.yaml and put the desired rules conditions. In this example, I’m creating an alert for simple commands like whoami & who. When these commands are executed inside the NGINX container, Falco will alert us.

Sample custom_alerts.yaml file:

customRules:
rules-nginx.yaml: |
- macro: nginx_consider_syscalls
condition: (evt.num < 0)

- macro: app_nginx
condition: container and container.image contains "nginx"

# Any outbound traffic raises a WARNING

- rule: The program "whoami" is run in a container
desc: An event will trigger every time you run "whoami" in a container
condition: evt.type = execve and evt.dir=< and container.id != host and proc.name = whoami
output: "whoami command run in container (user=%user.name %container.info parent=%proc.pname cmdline=%proc.cmdline)"
priority: NOTICE
warn_evttypes: False

- rule: The program "locate" is run in a container
desc: An event will trigger every time you run "locate" in a container
condition: evt.type = execve and evt.dir=< and container.id != host and proc.name = locate
output: "locate command run in container (user=%user.name %container.info parent=%proc.pname cmdline=%proc.cmdline)"
priority: NOTICE
warn_evttypes: False

Finally upgrade the Helm chart with new configuration file to be added to the security alerting.

helm upgrade falco -f custom_alerts.yaml falcosecurity/falco

You should see the following output:

Release "falcosec" has been upgraded. Happy Helming!
NAME: falcosec
LAST DEPLOYED: Tue Oct 6 12:11:47 2020
NAMESPACE: default
STATUS: deployed
REVISION: 2
TEST SUITE: None
NOTES:
Falco agents are spinning up on each node in your cluster. After a few
seconds, they are going to start monitoring your containers looking for
security issues.

No further action should be required.

Now again SSH into any of the NGINX pods and execute the following statements, which will trigger the custom rules.

kubectl exec -it 'nginx2-7844999d9c-wdpz8' /bin/bash

and generate the activity like below:

whoami
find

Falco will generate the following alerts on CloudWatch on the test simulation.

Likewise you can create as many custom rules based on the application or system requirement. Please check the detailed explanation on creating custom rules here.

Creating Custom Amazon CloudWatch Insights

Go to Amazon CloudWatch Insights in the AWS Console and create custom insights and dashboards as needed. In this demo, I’m going to create custom dashboard with the name falco and add with two Insights based on the rules we have simulated so far.

Create two insights for the rules “Mkdir binary dirs” and “Read sensitive file untrusted,” which get the logs for the last three hours and matches any message in the logs as “Mkdir binary dirs” and creates a dashboard for the same.

Finally, the dashboard Falco should look like the following screenshot as a sample.

Creating custom Amazon CloudWatch alarms

Create an Amazon SNS topic and subscription for email with the appropriate permissions to send email alerts

Go to Amazon CloudWatch in the AWS Console and create an alarm as follows.

Select the Amazon CloudWatch Log Group name for setting LogGroupName and choose an appropriate value for Statistic. For this demo, I have chosen Statistic as Sum with the threshold the of alerts as one, which means the Amazon CloudWatch Alarm will alert you with an email when Falco sends at least one alert to Amazon CloudWatch. This will alert for all alerts with minimum alert threshold as one alert.

Overall the alarm should look like the following screenshot:

The CloudWatch alarm will send email alerts as below your email address mentioned:

Conclusion

In this post, I have demonstrated how you can set up an Amazon EKS cluster with a sample NGINX website and configure runtime container security analysis and alerting with CNCF Falco and Amazon CloudWatch using custom dashboards and alarms. CNCF Falco can be configured for stream any custom logs as well as standard alerts. Please check the AWS EKS security for latest updates and EKS security best practises GitHub page if you would like to suggest new features or check the latest roadmaps by the team.

Containers

Implementing Runtime security in Amazon EKS using CNCF Falco

Set up your Amazon EKS cluster

Set up a sample deployment on Amazon EKS Cluster

Set up a Falco runtime security

Clone the Falco repository

Set up the Falco with IAM permissions

Set up the Falco Helm repository

Helm repo update

Simulating and Testing

Creating Custom Rules

Creating Custom Amazon CloudWatch Insights

Creating custom Amazon CloudWatch alarms

Conclusion

Resources

Learn

Resources

Developers

Help