Posted On: Nov 2, 2021
Amazon DevOps Guru now supports additional metrics at the node and pod-level for clusters managed by Amazon Elastic Kubernetes Service (EKS).
Amazon DevOps Guru is a Machine Learning (ML) powered service that makes it easy to improve an application’s operational performance and availability. When Amazon DevOps Guru detects anomalous behavior in these metrics, it creates an insight that contains recommendations and lists of metrics and events that are related to the issue to help you diagnose and address the anomalous behavior.
These node-level metrics help pinpoint specific nodes that may have high memory, CPU, or filesystem utilization, instead of relying on cluster-level aggregates. Pod-level metrics, which include pod_cpu_utilization_over_pod_limit and pod_memory_utilization_over_pod_limit, will help identify which pods are going over soft limits, and therefore are in danger of hitting hard resource constraints and are at a risk of producing errors due to resource exhaustion. Amazon DevOps Guru now also tracks container restarts and notifies you of issues with pulling images or issues with application startup. We will also be continuing to expand Amazon DevOps Guru support for containers.
We are also introducing a new console view that will show Amazon EKS insights grouped together by metric at the cluster level in the Amazon DevOps Guru console. This view provides you more visibility into where a potential problem lies within the EKS cluster. For example, if a node is having network connectivity issues or is experiencing disk pressure, you will see the node and namespace anomalies appear grouped together under that metric by cluster which will help you identify the specific node or namespace with the issue.
To use these new features, you will need to enable Container Insights on Amazon EKS.
You can get started with Amazon DevOps Guru by selecting coverage from your CloudFormation stacks or your AWS account. To learn more, visit the DevOps Guru product page and the documentation pages, or post a question to the Amazon DevOps Guru forum.