Self-Service Observable Amazon EKS

How to deliver observable Kubernetes as part of your Internal Developer Platform

In this article we will talk about the causes, outcomes and requirements of platform engineering and internal developer platforms (IDPs). We’ll also discuss how Kubernetes (K8s) and observability are key components in the service catalog that most development teams require as well as how you can fully automate the process of spinning up Amazon Elastic Kubernetes Service (Amazon EKS) clusters with Grafana Cloud observability built-in, as part of your IDP offerings.

Let's dig in!

Self-service observability for Amazon EKS

A wide-spreading transformation

DevOps has transformed how most organizations build software, and as Melvin Conway wisely highlighted back in the late ’60s, the change in structure of teams and how those teams operate has inevitably resulted in dramatic transformation in the architecture of those systems.

Perhaps one of the most pivotal changes in how organizations operate because of this transformation lies in the cross-functional composition of most teams and the freedom they have in terms of designing and building the systems they are responsible for, end-to-end.

This transformation produced undeniable benefits, such as increased development velocity, increased developer satisfaction and sense of empowerment, and enabled organizations to build software capable of handling unprecedented scale.

But as these patterns grew and evolved across the organization, there were some perhaps predictable but relatively overlooked challenges. What happens when you have a growing number of independent teams building software following their own decisions and lifecycles? You get massive diversity and with it, huge complexity in operating systems at production scale.

“You build it, you own it”

The driving mantra behind this new way of thinking is “you build it, you own it,” which translates into many intricate nuances of broad-reaching implications when put into practice.

This simple statement implies that teams must be cross-functional and usually results in pulling folks away from their comfort zones, exposing them to the need of quickly learning and adapting to new technologies and responsibilities.

And this had some interesting outcomes:

  • Developers and similar roles were exposed to higher levels of cognitive load, and they still required enablement and support from external subject matter experts (SMEs), such as site reliability engineering (SRE) for reliability and operations, DevOps engineers that supported them with continuous integration, delivery and automation, and cloud engineers and system administrators that supported their decisions around cloud architecture and how to keep the lights on (at a reasonable cost), to name a few.
  • Autonomy was difficult in multi-tenant environments, where teams looking to “own stuff” had to share with other teams that had different life cycles and production requirements.

This triggered two undesired side effects: developers suffered from poor developer experience, and everybody was spinning stuff up everywhere, with little consistency and a lot of redundancy.

Clusters and more clusters

So, we enter the world of many, many clusters. As teams became autonomous and self- sufficient, most of them had to choose a runtime environment for their applications and, as we’ve seen for most of the world, gradually Kubernetes became the go-to orchestrator, rapidly evolving in managed capabilities all the way into solutions such as AWS Fargate, that altogether removed many of the operational aspects of keeping a cluster running and up to date.

Imagine you have five teams, and each team chooses to spin up their own clusters, and each team has three different environments, each environment in a different Amazon Web Services (AWS) account. This scenario, which is the baseline for most organizations, means SREs, infrastructure engineers, and operations folks now had 15 different clusters to support and observe, because as we highlighted prior “you build it, you own it” has not really resulted in the elimination of horizontal teams of SMEs that provide support in keeping the lights on.

Observing a Kubernetes cluster

What does it take to observe a Kubernetes cluster? Let me give you a hint: it is not as simple as it sounds. It’s important to highlight what observability is before digging further:

Observability is a quality of a system that allows an observer to understand the state of the full system without direct manipulation, rather through the outputs of the system.

Which means of course you will need to produce the necessary system outputs (data) and send all those outputs somewhere, where they can be viewed, queried, and related.

And the data you produce will need to provide insights into at least the following perspectives of your system:

  • Node level status and resource utilization: This will be highly dependent on the flavor of Amazon EKS service you’re running since it could be anything from Amazon EKS Distro running locally in your developer laptop to fully serverless Amazon EKS on AWS Fargate in the cloud. Nevertheless, you will need to understand how the underlying compute which is actually running your container image is behaving in terms of resources and status.
  • Storage and networking: Even the simplest application deployed onto Kubernetes will require some form of network components, from services to ingresses.
  • Applications: And then, of course, there are the applications, the actual container images you’ve built and deployed to the orchestrator, which may come with sidecars, mounted volumes, and other such features.
  • Application behavior: Applications don’t run in a vacuum nor in a consistently stable form. They will be updated, they may crash or be subject to scaling up or down. These behaviors are also critical to have available as data that can be centrally observed.
  • Integrations with the cloud: Last but not least, most of the Amazon EKS solutions we’ve mentioned are tightly integrated with AWS. Many resources are automatically created and managed simply by annotating or creating resources within K8s, and these different aspects will have a tangible impact in how your application is currently operating.

Enter platform engineering

The whole concept behind platform engineering is to build a product: an IDP.

The IDP aims to solve the challenges of autonomy and self-service at scale, some of which have been mentioned above, including keeping teams independent and autonomous while at the same time reducing cognitive load, building into templated offerings best practices such as observability and, by standardizing some of the infrastructure, dramatically reducing the effort of horizontal SMEs in supporting development and production environments.

Self-service Kubernetes

Kubernetes, as you likely have gathered from a lot of what we have discussed so far, is a key item in any IDP.
 
The Kubernetes offering must allow teams to provision multiple environments efficiently while keeping costs at bay and ensuring security and observability best practices are met. Historically this has been quite an exercise in complexity, but things have very positively evolved thanks to technologies such as Amazon EKS, Amazon EKS add-ons, and the Grafana Cloud Kubernetes Monitoring add-on for Amazon EKS.

Amazon EKS and Amazon EKS add-ons

Amazon EKS is widely used for Kubernetes distribution in the cloud, and with solid reasons. Its integrations with the cloud dramatically simplify the creation and configuration of a wide range of underlying services, from Amazon Elastic Block Store (Amazon EBS) volumes, application load balancers, and even role-based access control (RBAC) integration with AWS Identity and Access Management (IAM).

A long-term support schedule and stringent security-first configuration opinions, together with its availability as a distribution to run clusters anywhere using alternatives such as Amazon EKS Distro and Amazon EKS Anywhere, make Amazon EKS the go-to choice for enterprises looking to standardize on a proven, upstream compatible, K8s distribution.

All these qualities make Amazon EKS an ideal choice as the underlying cluster of choice for IDPs and for operating clusters at scale.

A solid Kubernetes distribution is not capable of supporting the requirements of today’s application alone though, as many operational capabilities must be added for the cluster to fulfill its full set of requirements. This is where Amazon EKS add-ons really shine.

Amazon EKS add-ons provide a curated set of applications and operational capabilities that take away the complexity of configuration and integration with the cluster which, in case of observability, means it will handle all the configuration necessary to extract the required telemetry and push that data out to a centralized location.

Self-service cluster observability with Grafana Cloud

Grafana has been my go-to observability solution for quite some time. Grafana Cloud’s capabilities in terms of analytics, monitoring, and visualization are second to none.

As it turns out, Grafana Cloud offers an Amazon EKS add-on, available in AWS Marketplace, that handles all the necessary tasks of integration and configuration in your Amazon EKS cluster, automatically enabling full cluster observability.

The beauty of this solution is that, using tools like AWS Cloud Development Kit (AWS CDK) and good old Kubernetes manifests, you can fully templatize cluster provisioning with the necessary Amazon EKS add-ons that get automatically configured to match your observability requirements while keeping developers fully capable of autonomous self–service.

Templatizing your Amazon EKS offering

The first step in enabling fully production-ready cluster provisioning using a self-service model is building a coded-template of what the developer should get when choosing the offering from the IDP service catalog.

Using tools like AWS CDK provides huge amounts of flexibility and extensibility and provides wide coverage of all capabilities supported by most AWS solutions, including, of course, Amazon EKS.

Getting a cluster provisioned takes a handful of lines of code, and since you are using the AWS CDK, you can parameterize all tenant specific requirements, including cluster name and configuration.

Grafana Cloud - Lines of code to provision a cluster

Once you have created a cluster, adding Grafana using the Amazon EKS add-ons capability is also another few lines of code!

Full observability with a couple of lines of code

Once you have a cluster, you can use the add-on support in AWS CDK to add the Grafana Cloud Kubernetes Monitoring add-on with a few lines of code! This incredibly simple solution will provide your developers with a cluster capable of delivering the more important pieces of data for observability right out of the box.

Code - Grafana Cloud Kubernetes Monitoring add-on

Your teams will have instant access to identify what data is flowing as soon as the cluster and add-on have finished getting deployed:

Grafana - Kubernetes Monitoring Configuration

But data alone is not enough. What about the means to view that data?

Grafana-powered K8s dashboards through automation

Grafana Cloud provides a complete set of Kubernetes monitoring dashboards right out of the box that without any additional effort enable teams to have a clear and comprehensive understanding of the state of their container environments.

Grafana Kubernetes Monitoring - Overview Dashboard

Nevertheless, sometimes you will want to create custom dashboards or enable teams to build their own programmatically and embed them as part of your K8s service catalog offerings.

Grafana’s declarative dashboard capabilities enable just that through JSON code. This capability, once integrated as part of an AWS CDK-based codified template, allows for huge flexibility and customization as dashboards can be automatically configured to match the resources provisioned as part of the IDP offering.

So how do you go about adding these dashboards programmatically to any cluster provisioned through your IDP? Again, all it takes is a few lines of code:

Code Snippet - Add Dashboards to Cluster

In a nutshell

Using Amazon EKS, Amazon EKS add-ons, and Grafana, you can fully automate the provisioning of any number of standardized, observable clusters as part of your IDP, allowing teams to remain autonomous and self-serve, while reducing the effort in supporting and operating a large cluster fleet!

The Grafana Amazon EKS add-on is an offering available in AWS Marketplace and does require a Grafana Cloud account, which you can also configure automatically as part of your IDP templated offering.

We’ll be releasing a lab that walks you through the full process of creating the solution shown above, so be on the lookout!

Why AWS Marketplace?

Try SaaS products free with your AWS account to establish your proof-of-concept then pay-as-you-go in production with AWS Billing.

AWS Marketplace Free Trials

Quickly go from POC to production - access free trials using your AWS account, then pay as you go.

AWS Marketplace Tech Stack

Add capabilities to your tech stack using fast procurement and deployment, with flexible pricing and standardized licensing.

AWS Marketplace Cloud Infrastructure

Consolidate and optimize costs for your cloud infrastructure and third-party software, all centrally managed with AWS.