[SEO Subhead]
This Guidance demonstrates how you can configure the AWS Observability Accelerator to collect and visualize metrics, logs, and traces on Amazon Elastic Kubernetes Service (Amazon EKS) clusters. Use it with the AWS Observability Accelerator for Terraform infrastructure as code models to rapidly deploy and operate observability for your AWS environments.
Please note: [Disclaimer]
Architecture Diagram
-
Optional
-
Main Architecture
-
Optional
-
This architecture diagram provides an optional way to setup an Amazon Elastic Kubernetes Service (Amazon EKS) cluster provisioned through an Amazon EKS Blueprint for Terraform. For more details about the main architecture, open the other tab.
Optional
To deploy this Guidance, you need an Amazon Elastic Kubernetes Service (Amazon EKS) cluster provisioned. These steps show how to provision an Amazon EKS cluster with Amazon EKS Blueprint for Terraform.
Step 1
Administrator or DevOps user commits Infrastructure as Code (IaC) code changes to Amazon EKS blueprint into Git repository.Step 2
Blueprint provisioning workflow is invoked upon code push to Git repo.Step 3
Terraform starts resource deployment processes against target AWS environment.Step 4
The required AWS Identity and Access Management (IAM) roles, polices, and AWS Key Management Service (AWS KMS) keys are created by Terraform.Step 5
The Amazon EKS virtual private cloud (VPC) for the control plane component is deployed by Terraform.
Step 6
The Amazon EKS cluster control plane component is deployed into the Amazon EKS VPC by Terraform.Step 7
Your VPC is deployed for the compute plane by Terraform.Step 8
Subnets and other networking components are deployed into cluster VPCs by Terraform.Step 9
The Amazon EKS node group with compute plane nodes (Amazon Elastic Compute Cloud (Amazon EC2) instances in auto scaling group) is deployed into the cluster VPC by Terraform and joins the Amazon EKS cluster.Step 10
The Amazon EKS cluster is available for application deployment. The Kubernetes API is accessible for the command line interface (CLI) clients and applications through a Network Load Balancer (NLB). -
Main Architecture
-
This architecture diagram demonstrates the deployment of AWS Observability Accelerator on an Amazon Elastic Kubernetes Service (Amazon EKS) cluster. For an optional way to setup an Amazon EKS cluster, open the other tab.
Step 1
The administrator or DevOps team users initiate the installation of the AWS Observability Accelerator through a Terraform blueprint.Step 2
The required IAM roles and polices are created by Terraform.Step 3
AWS Distro for OpenTelemetry collector resources are deployed into an Amazon EKS cluster by Terraform.Step 4
Amazon Managed Service for Prometheus is deployed in multiple Availability Zones (multi-AZ) and configured with alerts and rules by Terraform.Step 5
Amazon Managed Grafana is deployed in multi-AZ mode. It’s integrated with Amazon Managed Service for Prometheus and other services by Terraform.Step 6
Metrics are collected from microservices, pods, or jobs running on the Amazon EKS cluster by the Distro for OpenTelemetry collector.Step 7
Collected metrics are exported to Amazon Managed Service for Prometheus through the writer endpoint, and stored in a time-series database. Metrics can be exported to Amazon Simple Storage Service (Amazon S3). Alert rules are created in Amazon Managed Service for Prometheus based on metric thresholds.Step 8
Imported metrics are available for queries to Amazon Managed Grafana through the query endpoint of the data source.Step 9
Users authenticate to Amazon Managed Grafana through AWS IAM Identity Center (or another Single Sign-on provider).Step 10
Metrics and metadata are available to IAM authenticated and authorized users in the Amazon Managed Grafana user interface through dashboards.
Well-Architected Pillars
The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
-
Operational Excellence
This Guidance uses services that offer full observability through monitoring and logging, providing you with reliable, stable, and dependable applications. Cloud Administrators and DevOps team users can review metrics and receive alerts defined in this Guidance to monitor the health of both their infrastructure and cloud workloads.
In the unlikely, yet possible, event of a complete AWS Region outage, the sample code included with this Guidance can be redeployed to another AWS Region with minor changes to the Terraform modules. Because of the highly available configuration of managed services, as well as the Amazon EKS cluster-based components, all resources are managed efficiently. There are related log events that provide insights into AWS resources utilization available to Cloud Administrators and DevOps teams.
Finally, consider reviewing the Implement Feedback Loops document which describes how feedback loops are set up, how they work, and why they are helpful for driving actionable insights that aid in decision making.
-
Security
The principle of least privilege is applied throughout this Guidance, limiting each resource’s access to only what is required. Amazon EKS clusters are normally deployed in a separate virtual private cloud (VPC) and can be accessed only through protected API endpoints front-ended by load balancers using HTTPS traffic with SSL certificates. Amazon Managed Service for Prometheus endpoints as well as Amazon Managed Grafana workspaces are secured through HTTPS traffic and certificates. Access to the Amazon Managed Grafana user interface is controlled through IAM.
-
Reliability
Both Amazon Managed Service for Prometheus and Amazon Managed Grafana are Region-level services which are deployed across different Availability Zones for high availability and reliability. The Amazon EKS components, such as the Amazon Managed Service for Prometheus node exporter and the Distro for OpenTelemetry metrics collector pods, are supported by Kubernetes. With Kubernetes, you get distributed systems built with microservices where reliability is synonymous with stability.
Amazon Managed Service for Prometheus service logs can be enabled with Amazon CloudWatch. Also, log groups for core components deployed into an Amazon EKS cluster get created in CloudWatch and can be used for troubleshooting. Alerts can also be configured based on those metrics and delivered directly to Cloud Administrators or DevOps teams through notification channels, such as Amazon Simple Notification Service (Amazon SNS).
-
Performance Efficiency
Amazon Managed Service for Prometheus and Amazon Managed Grafana are Amazon native services. This Guidance focuses on performant-efficient ways to deploy and integrate them with selected resources so that you can monitor and improve the efficiency of your Amazon EKS applications with high availability and low operational costs. Further optimization can be performed based on your CPU and memory usage, network traffic, and your input/output operations per second (IOPS).
-
Cost Optimization
Automation and scalability are cost-saving features this Guidance utilizes with Terraform and the AWS Management Console respectively. A centralized administration is implemented through the console, the AWS Command Line Interface (AWS CLI), and through the Amazon Managed Grafana console. These features allow for effective detection and correction of issues in both the infrastructure and the application development or deployment processes, reducing the total costs of development efforts.
Amazon Managed Service for Prometheus pricing includes details on what you are charged for metrics ingested and stored. More specifically, we charge for every ingested metric sample and the disk space it uses to store them. Reducing the resolution of time-series data to longer time slices reduces the number of stored metrics and its associated costs.
-
Sustainability
This Guidance deploys and integrates AWS services and an Amazon EKS cluster running in the AWS Cloud—there is no need to procure any physical hardware. Capacity providers keep virtual “infrastructure” provisioning to a minimum along with the necessary auto-scaling events should the workloads demand it.
Every pod running on the Kubernetes platform, including the Amazon EKS cluster, will consume memory, CPU, I/O, and other resources. This Guidance allows for the fine-grained collection and visualization of these metrics. Cloud Administrators and DevOps engineers can monitor their resource utilization through their own internal metrics and log events, and perform configuration updates when indicated by those metrics to achieve sustainable resource utilization.
Implementation Resources
A detailed guide is provided to experiment and use within your AWS account. Each stage of building the Guidance, including deployment, usage, and cleanup, is examined to prepare it for deployment.
The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.
Related Content
AWS Observability Accelerator for Terraform
AWS Observability Accelerator for Terraform
Disclaimer
The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.
References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.