How Falco uses Prow on AWS for open source testing

This post was co-written with Leo Di Donato, an open source software engineer at Sysdig in the Office of the CTO.

Kubernetes has seen massive growth in the past few years. However, with all growth comes growing pains, and CI/CD has brought a few interesting problems to the space, especially for the open source community. Many traditional CI/CD solutions worked on top of Kubernetes but lacked the tie-in to GitHub. What the community needed was a solution with the following attributes:

Kubernetes-native
Comprehensive GitHub automation
Concurrency at scale

Kubernetes ended up solving this problem for themselves because other tools weren’t getting the job done. So, the Kubernetes Testing Special Interest Group (sig-testing) created their own tools and services, including Prow and other tools located at Kubernetes test-infra. Prow is not only Kubernetes-native with the ability to gather kubernetes-specific metadata, but it is also massively scalable, performing roughly 10,000 jobs a day for Kubernetes-related repositories.

What is Prow?

The Testing SIG defines Prow as:

“A Kubernetes based CI/CD system. Jobs can be triggered by various types of events and report their status to many different services. In addition to job execution, Prow provides GitHub automation in the form of policy enforcement, chat-ops via /foo style commands, and automatic PR merging.”

That description just touches the surface of how powerful Prow really is. It is a container-based microservice architecture that consists of several single-purpose components—many integrating with one another, whereas others are more or less standalone. Prow is a CI/CD tool that starts, runs, and finishes all within GitHub code commits.

What is Falco?

The Falco Project, originally created by Sysdig, is an incubating Cloud Native Computing Foundation (CNCF), open source, cloud native runtime security tool. Falco makes it easy to consume kernel events and enrich those events with information from Kubernetes and the rest of the cloud native stack. Falco has a rich set of security rules specifically built for Kubernetes, Linux, and cloud native. If a rule is violated in a system, Falco will send an alert notifying the user of the violation and its severity.

Getting started

Falco is the first open source Prow project running exclusively on AWS infrastructure. Prow runs on Amazon Elastic Kubernetes Service (Amazon EKS) because it’s a fully managed Kubernetes experience that allows it to be compatible with APIs used by the Prow system.

In this tutorial, we will cover how Falco set up Prow on Amazon EKS clusters, followed by configuration to utilize the Prow Hook component to invoke GitHub webhooks (events) and Spyglass to utilize testing inspection of pods and jobs.

The following figure is an architecture diagram of Prow on Amazon EKS. We’ll also provide a quick breakdown of the main components of Prow and what each microservice is responsible for.

Architecture diagram of Prow.

Main components of Prow

ghProxy is a reverse proxy HTTP cache optimized for use with the GitHub API. It is essentially a reverse proxy wrapper around ghCache with Prometheus instrumentation to monitor disk usage. ghProxy is designed to reduce API token usage by allowing many components to share a single ghCache.
Hook is a stateless server that listens for GitHub webhooks and dispatches them to the appropriate plugins. Hook’s plugins are used to trigger jobs, implement “slash” commands, post to Slack, and more. See the prow/plugins directory for more information.
Deck presents a nice view of recent jobs, command and plugin help information, current status and history of merge automation, and a dashboard for pull request (PR) authors.
Sinker cleans up old jobs and pods.
Prow Controller Manager, formally known as Plank, is the controller that manages the job execution and lifecycle for jobs that run in Kubernetes pods.
Pod utilities are small, focused Go programs used by plank to decorate user-provided PodSpecs to perform tasks such as cloning source code, preparing the test environment, and uploading job metadata, logs, and artifacts. In practice, this is done by adding InitContainers and sidecar Containers to the prowjob specs.
Spyglass is a pluggable artifact viewer framework for Prow. It collects artifacts (usually files in a storage bucket, such as Amazon Simple Storage Service [Amazon S3]) from various sources and distributes them to deck, which is responsible for consuming them and rendering a view.
Horologium triggers periodic jobs when necessary.
Crier reports on status changes of prowjobs. Crier can update sources such as Github and Slack on status changes of job executions.
Status-reconciler ensures that changes to blocking presubmits that are in Prow configuration while PRs are in flight do not cause those PRs to get stuck.
Tide is a component that manages a pool of GitHub PRs that match a given set of criteria. It will automatically retest PRs that meet given criteria and automatically merge them when they have up-to-date passing test results.

How Falco installs Prow on AWS

This guide assumes we have an Amazon EKS cluster already launched and ready. Check out the official getting started guide for details on how to launch a cluster.

Set up GitHub repos

In the first step of this process, we will set up and configure our GitHub repositories. Falco uses two GitHub accounts for Prow setup. The Falco Security organization belongs to Account 1, and Account 2 is the bot that will have repository access and will interface with Hook to manage all GitHub automation.

Diagram illustrating the relationship between the PR author and two GitHub accounts as part of Prow setup.

The Falco bot for Account 2 is called poiana, which is the Italian word for hawk as most of the core maintainers are from Italy.

Give bot access to main organization/repository

In GitHub, navigate to Repository Settings, Manage Access, Invite a collaborator and add the name of our bot account as a collaborator.

Then, create a personal access token on the GitHub main account and save it for later. Add the following scopes:

repo
admin:org
admin:repo_hook
admin:org_hook

Save the token to a local file oauth-token and create the Kubernetes secret for the token.

kubectl create secret generic github-token —from-file=token=config/oauth-token

Set up AWS ALB Ingress controller

For the Prow setup, we will need to publicly expose hook and deck. We will create an application load balancer (ALB), which will route traffic to the appropriate Kubernetes service.

Here, you can follow the guide for installing the AWS Load Balancer Controller, which includes setting up proper identity and access management (IAM) and cert manager.

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/name: alb-ingress-controller
  name: alb-ingress-controller
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: alb-ingress-controller
  template:
    metadata:
      labels:
        app.kubernetes.io/name: alb-ingress-controller
    spec:
      containers:
        - name: alb-ingress-controller
          args:
            - --cluster-name=falco-prow-test-infra
            - --ingress-class=alb
          image: amazon/aws-alb-ingress-controller:v2.0.0
      serviceAccountName: alb-ingress-controller

Install the ingress and configure the webhook setup in GitHub

In this ingress, we create an internet-facing load balancer and listen on ports 80 and 8888. We direct all requests on port 80 to deck and those on 8888 to hook. This will be the dedicated ingress to handle external requests to the cluster from permitted external services such as GitHub.

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  namespace: default
  name: ing
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTP": 8888}]'
spec:
  rules:
    - http:
        paths:
          - path: /
            backend:
              serviceName: deck
              servicePort: 80
          - path: /hook
            backend:
              serviceName: hook
              servicePort: 8888

Now we find our load balancer URL:

kubectl get ingress ing

ing    *       1234567-default-ing-3628-303837175.us-west-2.elb.amazonaws.com   80      43d

We need to create a token for GitHub that will be used for validating webhooks. We use OpenSSL to create that token.

openssl rand -hex 20 > config/hmac-token

#8c98db68f22e3ec089114fea110e9d95d2019c5d

kubectl create secret generic hmac-token —from-file=hmac=config/hmac-token

Now we can configure a webhook in GitHub by going to our repo Settings, WebHooks and using this load balancer URL. This is on our bot account poiana, because that account will be interfacing with Prow.

The payload URL is our http://LB_URL:8888:hook.

The webhook needs to be in application/json format, and we set the payload URL to point to Hook.

The secret here is the hmac-token, 8c98db68f22e3ec089114fea110e9d95d2019c5d, which we just added to Kubernetes.

Screenshot of Falco webpage showing the webhooks described.

If we configured everything correctly, at this point we should see webhooks coming back successfully:

Screenshot showing webhooks coming back successfully.

Create an S3 bucket for Prow data

Prow uses a storage bucket to store information, such as:

(SpyGlass) build logs and pod data
(Tide) status
(Status reconiler) status

aws s3 mb s3://falco-prow-logs

This is what the bucket’s content will look like after it’s been running for a bit. The storage so far has been about 100 jobs = 10MB of data if storage costs are a concern. This value can vary with the length of job logs.

Screenshot showing the bucket's contents after it has been running for a period of time.

Create an AWS IAM user and S3-credentials secret

We created the S3 bucket for our Prow storage, and now we need to create an AWS Identity and Access Management (IAM) user with S3 permissions for our cluster. Because of the way the Prow code is written, the pod utilities sidecar pod is responsible for uploading build logs and podinfo to S3 before, during, and after the execution, and requires IAM access key format.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Query Buckets",
            "Effect": "Allow",
            "Action": [
                "s3:GetAccessPoint",
                "s3:PutAccountPublicAccessBlock",
                "s3:GetAccountPublicAccessBlock",
                "s3:ListAllMyBuckets",
                "s3:ListAccessPoints"
            ],
            "Resource": "*"
        },
        {
            "Sid": "Write Bucket Logs",
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::falco-prow-logs",
                "arn:aws:s3:::falco-prow-logs/*",
            ]
        }
    ]
}

Once we have created our Prow IAM user, we create the file s3-secret.yaml and kubectl apply -f s3-secret.yaml:

apiVersion: v1
kind: Secret
metadata:
  namespace: default
  name: s3-credentials
stringData:
  service-account.json: |
   {
     "region": "us-west-2",
     "s3_force_path_style": true,
     "access_key": "XXXXXXXXXXXX",
     "secret_key": "XXXXXXXXXXXXXXXXXXXX"
   }

Install config and plugins configmaps

There are different config files that we supply to components that don’t include AWS, GitHub, or oAuth access.

The first is the config.yaml used to declare configuration for all of our components. This is the configuration of all the Prow microservices and the job configurations.

The second is the plugins.yaml used to declare which plugins to use in our GitHub and Deck setup. This controls all the GitHub label, branch protection, and triggers automation as well.

Here is a basic example of our config.yaml:

#Full example at https://raw.githubusercontent.com/falcosecurity/test-infra/master/config/config.yaml
prowjob_namespace: default
pod_namespace: test-pods
in_repo_config:
  enabled:
    "*": true
deck:
 spyglass:
   lenses:
   - lens:
       name: metadata
     required_files:
     - started.json|finished.json
   - lens:
       config:
       name: buildlog
     required_files:
     - build-log.txt
   - lens:
       name: junit
     required_files:
     - .*/junit.*\.xml
   - lens:
       name: podinfo
     required_files:
     - podinfo.json
plank:
  job_url_template: 'http://prow.falco.org/view/s3/falco-eks-logs/{{if eq .Spec.Type "presubmit"}}pr-logs/pull{{else if eq .Spec.Type "batch"}}pr-logs/pull{{else}}logs{{end}}{{if .Spec.Refs}}{{if ne .Spec.Refs.Org ""}}/{{.Spec.Refs.Org}}_{{.Spec.Refs.Repo}}{{end}}{{end}}{{if eq .Spec.Type "presubmit"}}/{{with index .Spec.Refs.Pulls 0}}{{.Number}}{{end}}{{else if eq .Spec.Type "batch"}}/batch{{end}}/{{.Spec.Job}}/{{.Status.BuildID}}/'
  job_url_prefix_config:
    '*': http://prow.falco.org/view/
  report_templates:
    '*': '[Full PR test history](http://prow.falco.org/pr-history?org={{.Spec.Refs.Org}}&repo={{.Spec.Refs.Repo}}&pr={{with index .Spec.Refs.Pulls 0}}{{.Number}}{{end}}). Please help us cut down on flakes by [linking to](https://git.k8s.io/community/contributors/devel/sig-testing/flaky-tests.md#filing-issues-for-flaky-tests) an [open issue](https://github.com/{{.Spec.Refs.Org}}/{{.Spec.Refs.Repo}}/issues?q=is:issue+is:open) when you hit one in your PR.'
  allow_cancellations: true # AllowCancellations enables aborting presubmit jobs for commits that have been superseded by newer commits in Github pull requests.
  max_concurrency: 100 # Limit of concurrent ProwJobs. Need to be adjusted depending of the cluster size.
  pod_pending_timeout: 60m
  default_decoration_configs:
    '*':
      timeout: 3h
      grace_period: 10m
      utility_images:
        clonerefs: "gcr.io/k8s-prow/clonerefs:v20200921-becd8c9356"
        initupload: "gcr.io/k8s-prow/initupload:v20200921-becd8c9356"
        entrypoint: "gcr.io/k8s-prow/entrypoint:v20200921-becd8c9356"
        sidecar: "gcr.io/k8s-prow/sidecar:v20200921-becd8c9356"
      gcs_configuration:
        bucket: s3://falco-prow-logs
        path_strategy: explicit
      s3_credentials_secret: "s3-credentials"
tide:
  sync_period: 2m
  queries:
  - labels:
    - approved
    missingLabels:
    - needs-rebase
    - do-not-merge/hold
    - do-not-merge/work-in-progress
    - do-not-merge/invalid-owners-file
    repos:
    - falco/test-infra

And our plugins.yaml:

#Full example at https://raw.githubusercontent.com/falcosecurity/test-infra/master/config/plugins.yaml
triggers:
- repos:
  - falcosecurity
  join_org_url: "https://github.com/falcosecurity/falco/blob/master/GOVERNANCE.md#process-for-becoming-a-maintainer"
  only_org_members: true
config_updater:
  maps:
    config/config.yaml:
      name: config
    config/plugins.yaml:
      name: plugins
    config/jobs/**/*.yaml:
      name: job-config
      gzip: true
plugins:
  falcosecurity/test-infra:
    - approve
    - lgtm
    - skip
    - size
    - welcome
    - owners-label
    - label
    - approve
    - trigger

If installing this on your own setup, change:

prow.falco.org to your Deck URL
s3://falco-prow-logs to your S3 bucket

falco/test-infra to your organization/repository.

username/repository also works if not part of an organization

kubectl create configmap config —from-file=config.yaml=config/config.yaml
kubectl create configmap plugins —from-file=plugins.yaml=config/plugins.yaml

Install ghProxy with Amazon EBS storage retention

ghProxy is a proxy and cache for GitHub, which helps us handle requests and retries that go above the GitHub API rate limit and will retain data in Amazon Elastic Block Store (Amazon EBS) for long-term retention of calls and data.

First, we install the Amazon EBS CSI driver into our cluster, then the ghProxy and PersistentVolumeClaim (PVC) for our cluster:

kubectl apply -k "github.com/kubernetes-sigs/aws-ebs-csi-driver/deploy/kubernetes/overlays/stable/?ref=master"

kubectl apply -f https://raw.githubusercontent.com/falcosecurity/test-infra/master/config/prow/ghproxy.yaml

We can check that it’s up and running and that the PVC worked correctly by running kubectl get PersistentVolume:

Install Prow components on the cluster

The next step is deploying the rest of the microservices for Prow on our Amazon EKS cluster. This cluster is going be where we run the base testing infrastructure and will be the client-facing cluster in the test setup, as Falco security is a public project.

To get our cluster up and running, we need to launch the following microservices in addition to all the pieces we’ve launched so far. Take a look at the Falco microservices on GitHub for a good example of how they are configured and used. You can also check out the recommended starting AWS deployment on K8s, which is a more minimal setup.

The main microservices launched are:

Exploring the Deck UI

Let’s take a look at Deck and explore all the capabilities it offers to users. Grab the external IP address once it’s launched, then navigate to the external IP and see the Deck webpage. Our external IP address will be an ELB classic URL because it’s using the Kubernetes LoadBalancer type.

kubectl get svc deck

Screenshot of output after running command the explore the deck properties,

Head to the Deck UI for the cluster. This is where you can see jobs and information about our jobs. You can check out Falco’s Prow setup on the Falco website.

Screenshot of the Deck UI showing the jobs and information about the jobs.

From here, we can see jobs from pull requests and from periodic jobs. We can also navigate to Spyglass by clicking on the eye icon. We set up the Spyglass service to see more information about the individual jobs.

This will give us information such as logs, Kubernetes events, pod information, job history, and deep links back to our original pull request in GitHub.

Note: Spyglass is an extremely powerful but also dangerous service. Be careful when turning this on. Showing pod info to the world can be dangerous if using cleartext secrets or secrets exposed via logs.

Screenshot of Spyglass UI showing pod information.

Screenshot showing job histories.

Adding our first job

We are going to add our first job to config.yaml and update the configmap. This will be a simple job that will just exit with an error. This job will fail and show red in our Deck console. The failure allows us to use the GitHub PR retesting commands to fix our error with a new commit and to use the self-service retest.

presubmits:
  falcosecurity/test-infra:
  - name: echo-hi
    decorate: true
    skip_report: false
    agent: kubernetes
    branches:
      - ^master$
    spec:
      containers:
      - command:
        - exit
        env:
        - name: AWS_REGION
          value: eu-west-1
        image: debian:buster
        imagePullPolicy: Always

Now we can update our configmap to add the new job, which will get picked up by Prow.

kubectl create configmap config --from-file=config.yaml=config/config.yaml --dry-run -o yaml | kubectl replace configmap config -f -

We get another code reviewer to add the /lgtm label to our PR, which will allow it to kick off the testing for Prow.

Screenshot showing communication with code reviewers.

The PR has been approved and will kick off the prowjob, which we can see in the GitHub check on the pull request. Here we see that our job failed, which means the job emitted an exit code of 0. We can now fix the issue in the same pull request.

Screenshot of communication with code reviewers.

This time, we changed our prowjob to run:

echo "Prow Rocks"

This returns an exit code of 1 and moves our failing test to a success.

presubmits:
  falcosecurity/test-infra:
  - name: echo-hi
    decorate: true
    skip_report: false
    agent: kubernetes
    branches:
      - ^master$
    spec:
      containers:
      - command:
        - echo
        - "Prow Rocks"
        env:
        - name: AWS_REGION
          value: eu-west-1
        image: debian:buster
        imagePullPolicy: Always

kubectl create configmap config --from-file=config.yaml=config/config.yaml --dry-run -o yaml | kubectl replace configmap config -f -

Next, we can use our retest command to retest the job with our new update.

Screenshot of retesting the command to retest the job.

After a minute here, our new echo-hi checks on prowjob in GitHub. This one looks like it passed, so we can go inspect the logs in Deck. We can also look at our PR history and see which jobs failed on each commit that was pushed.

Screenshot showing all the checks passing.

Screenshot showing the tests passing.

Screenshot of the PR history.

These examples show the power of Prow and the ability to test. We get the ability to use /foo style chat commands in our pull requests to rerun tests. This functionality gives developers the ability to write, debug, and fix tests all in a self-service manner, which is important for an open source project like Falco.

Once our tests have run, we can use Deck and Spyglass to check build logs, PR job history, and other job execution data to give us quick feedback on test results, which helps maintain that quick feedback loop for contributors.

Conclusions

Today, the Falco project spans more than 35 repos across the org. The project is in the process of securely moving over tests from across all the repos to Prow. Feel free to contribute at GitHub and check out the talk about advanced Prow usage given by Falco maintainer Leonardo Di Donato at KubeCon + CloudNativeCon: Going Beyond CI/CD with Prow.

Happy Prowing!

Leonardo Di Donato

Leo is an open source software engineer at Sysdig in the Office of the CTO, where he’s in charge of the open source methodologies and projects. His main duty is to make Falco, a cloud native tool for runtime security incubated by the CNCF. Leo is also involved in the Linux Foundation’s eBPF project (IO Visor) as a maintainer of the kubectl-trace project. As a long-standing OSS lover, he’s also the author of go-syslog, a blazingly fast Go parser for syslog messages and transports, and of kubectl-dig, a tool about deep visibility into Kubernetes directly from the kubectl, etc. Follow him on Twitter @leodido

The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post.