Containers

Empowering Kubernetes Observability with eBPF on Amazon EKS

Post co-written by Shahar Azulay, CEO and Co-Founder at GroundCover

Introduction

The abstraction introduced by Kubernetes allows teams to easily run applications at varying scale without worrying about resource allocation, autoscaling, or self-healing. However, abstraction isn’t without cost and adds complexity and difficulty tracking down the root cause of problems that Kubernetes users experience.

To mitigate these issues, detailed observability into each application’s state is key but can be challenging. Users have to ensure they’re exposing the right metrics, emitting actionable logs, and instrumenting their application’s code with specific client-side libraries to collect spans and traces. If that’s not a hard task on its own, gaining such visibility into multiple small and distributed, interdependent pieces of code that comprise a modern Kubernetes microservices environment, becomes a much harder task at scale.

A new emerging technology called eBPF (Extended Berkeley Packet Filter) is on its path to relieve many of these problems. Referred to as an invaluable technology by many industry leaders, eBPF allows users to trace any type of event regarding their application performance – such as network operations – directly from the Linux kernel, with minimal performance overhead, and without configuring side-cars or agents.

The eBPF sensor is out-of-band to the application code, which means zero code changes or integrations and immediate time to value all across your stack. The result is granular visibility into what’s happening within an Amazon Elastic Kubernetes Service (Amazon EKS) Kubernetes cluster.

In this post, we’ll cover what eBPF is, why it’s important, and what eBPF-based tools are available for you to get visibility into your Amazon EKS applications. We’ll also review how Groundcover uses eBPF to enable its cost-efficient architecture.

In today’s fast-paced world of software development, Kubernetes has emerged as a game-changer, offering seamless scalability and resource management for applications. However, with this abstraction comes the challenge of maintaining comprehensive observability in complex microservices environments. In this blog post, we’ll explore how eBPF (Extended Berkeley Packet Filter) is revolutionizing Kubernetes observability on Amazon EKS). Amazon EKS-optimized Amazon Machine Images (AMI) fully support eBPF, which empowers users with unparalleled insights into their applications.

Solution overview

The need for observability

As applications scale and interdependencies grow, gaining detailed insights into their state becomes vital for effective troubleshooting. Kubernetes users face the daunting task of instrumenting applications, collecting metrics, and emitting actionable logs to track down issues. This becomes even more challenging in modern microservices environments, where numerous small and distributed, independent pieces of code interact with each other.

Introducing eBPF: A game-changer for Kubernetes observability

In the quest for enhanced observability, eBPF emerges as an invaluable technology. Unlike traditional observability approaches, eBPF empowers users to trace any type of event regarding their application performance – such as network operations — directly from the Linux kernel, with minimal performance overhead, and without configuring side-cars or agents.

The out-of-band advantage of eBPF

The advantage of eBPF lies in its out-of-band approach to observability. Out-of-band means eBPF collects the data without being part of the application’s code. One of the advantages of eBPF is that no code change is needed and installation can be done on the instance level without handling configuration per application deployed. Another advantage is that it eliminates unexpected effects of the observability stack for your time-critical application code, as all data collection is being done outside of the application process.

Users can gain granular visibility into applications deployed on their Kubernetes clusters without instrumenting their code or integrating any third-party libraries. By tracing at the kernel level, eBPF enables users to analyze the inner workings of their applications, with immediate time to value, full coverage and unprecedented depth.

eBPF-based tools for Amazon EKS observability

A plethora of eBPF-based tools has emerged to provide comprehensive observability into Amazon EKS applications. These tools offer tracing capabilities for various events, including network operations, system calls, and even custom application events. For instance, BCC is a toolkit that helps simplify eBPF bootstrapping and development and includes several useful tools like network traffic analysis and resource utilization profiling. Another example is bpftrace, which is a little more focused on one-liners and short scripts for quick insights. These tools were built to be ad hoc, so you can run them directly on any Linux machine and get real-time value. Amazon EKS users who want to inspect their Kubernetes environment with eBPF tools should check out Inspector Gadget, which manages the packaging, deployment and execution of eBPF programs in a Kubernetes cluster, including many based on BCC tools.

Diagram of bpftrace and eBPF tools

In the following section, we’ll dive deeper into some of the open source projects that use eBPF to collect insight data about applications.

Walkthrough

Caretta

Caretta helps teams instantly create a visual network map of the services running in their Kubernetes cluster. Caretta uses eBPF to collect data in an efficient way and is equipped with a Grafana Node Graph dashboard to quickly display a dynamic map of the cluster.

The main idea behind Caretta is gaining a clear understanding of the inter-dependencies between the different workloads running in the cluster. Caretta maps all service interactions and their traffic rates, leveraging Kubernetes APIs (Application Program Interface) to create a clean and informative map of any Kubernetes cluster that can be used for on-demand granular observability, cost optimization, and security insights, which allows teams to quickly reach insights such as identifying central points of failure or pinpointing security anomalies.

Installing Caretta

Installing Carreta is as simple as installing a helm chart onto your Amazon EKS cluster:

helm repo add groundcover https://helm.groundcover.com/
helm install caretta --namespace caretta --create-namespace groundcover/caretta
kubectl port-forward -n caretta pods/<caretta-grafana POD NAME> 3000:3000

Once installed, Carreta provides a full network dependency map that captures the Kubernetes service interaction. The map is also interleaved with Prometheus metrics that it exposes to measure the total throughput of each link observed since launching the program.

Network dependency map generated by Carreta open source

That information, scraped and consolidated by a Prometheus agent, can be easily analyzed with standard queries such as sorting, calculating rate, filtering namespaces and time ranges, and of course — visualizing as a network map.

The following is an example of a metric exposed by a Caretta’s agent, and it’s related label that provide granular insight into the network traffic captured by eBPF:

caretta_links_observed{client_kind="Deployment",client_name="checkoutservice",client_namespace="demo-ng",link_id="198768460",role="server", server_port="3550"}

This is useful to detect unknown dependencies or network interactions but also to easily detect bottlenecks like the hot zones handling large volumes of network data and might be the root cause of problems in your Amazon EKS cluster.

But what can we do when network monitoring isn’t enough and application-level API monitoring is where the problem lies?

Hubble

Hubble is a network observability and troubleshooting component within Cilium (which is also based on eBPF for networking). Hubble uses eBPF to gain deep visibility into network traffic and to collect fine-grained telemetry data within the Linux kernel. By attaching eBPF programs to specific network events, Hubble can capture data such as packet headers, network flows, latency metrics, and more. It provides a powerful and low-overhead mechanism for monitoring and analyzing network behavior in real-time.

With the help of eBPF, Hubble can perform advanced network visibility tasks, including flow-level monitoring, service dependency mapping, network security analysis, and troubleshooting. It then takes this data and aggregates it to present it to the user through the Command Line Interface (CLI)or UI. Hubble enables platform teams to gain insights into network communications within their cloud-native environments and gives developers the ability to understand how their applications communicate without first becoming a networking expert.

Network communication flow in Hubble tool

Just like Carreta, Hubble knows how to create service dependency maps and metrics about the network connection it tracks with eBPF. Metrics like req/s and packet drops are captured by Hubble and can be used to detect issues at the network layer hiding in your Amazon EKS environment.

Hubble also provides Layer 7 metrics by tracking HTTP and DNS connections inside the cluster using eBPF. Here, metrics like request rate, latency rate, and error rate kick in unlocking application-level monitoring. You can use Hubble to detect application bugs, which high-latency APIs slow down the application, or observe slow performance degradation that might be lurking.

Metrics generated from Hubble open-source tool like latency, HTTP requests/response, Protocol, and more

Installing Hubble requires installation of Cilium, which is documented in the Cilium getting started guide and is out of scope of this post.

Groundcover: Pioneering cost-efficient Amazon EKS observability

Groundcover, a next-generation observability company, is an example of how future observability platforms will leverage eBPF as their main data collection sensor.

Groundcover, focused on cloud-native environments, utilizes eBPF to create a full-stack observability platform that provides instant value without compromising on scale, granularity, or cost. Its eBPF sensor was built in a performance-first mindset, which harnesses kernel resources to create a cost-efficient architecture for Amazon EKS observability. By collecting all its observability data using eBPF, Groundcover eliminates the need for intrusive code changes and manual labor, and the need to deploy multiple external agents. This streamlined approach not only enhances the coverage and depth of observability but also optimizes costs by reducing performance overhead.

Conclusion

In this post, we showed you how the open-source eBPF tool ecosystem. Customers can try out this technology on their own with our example demonstrating how this technology translates into next-generation observability platforms.

As the demand for Kubernetes observability continues to grow, eBPF has emerged as a transformative technology, redefining how we monitor applications in Amazon EKS clusters. With its unparalleled performance and seamless integration into the Linux kernel, eBPF offers granular visibility without disrupting existing application code.

Through eBPF-based tools, developers and operations teams can now troubleshoot and optimize their applications effortlessly, making Kubernetes observability more accessible and efficient. As more businesses embrace eBPF’s capabilities, we can expect organizations to migrate to Kubernetes with more confidence and assured of their expected observability coverage. This enables users to unleash the full potential of their applications on Amazon EKS and focus on building great products on top of Kubernetes. With eBPF’s bright future ahead, Groundcover and other industry leaders are paving the way for a new era of Kubernetes observability.

"Headshot

Shahar Azulay, groundcover

Shahar, CEO and cofounder of groundcover is a serial R&D leader. Shahar brings experience in the world of cybersecurity and machine learning having worked as a leader in companies such as Apple, DayTwo, and Cymotive Technologies. Shahar spent many years in the Cyber division at the Israeli Prime Minister’s Office and holds three degrees in Physics, Electrical Engineering and Computer Science, and currently strives to use technological learnings from this rich background and bring it to today’s cloud native battlefield in the sharpest, most innovative form to make the world of dev a better place.

Tsahi Duek

Tsahi Duek

Tsahi Duek is a Principal Container Specialist Solutions Architect at Amazon Web Services. He has over 20 years of experience building systems, applications, and production environments, with a focus on reliability, scalability, and operational aspects. He is a system architect with a software engineering mindset.