AWS Cloud Operations Blog
Tag: Management Tools
Gain operational insights for NVIDIA GPU workloads using Amazon CloudWatch Container Insights
As machine learning models grow more advanced, they require extensive computing power to train efficiently. Many organizations are turning to GPU-accelerated Kubernetes clusters for both model training and online inference. However, properly monitoring GPU usage is critical for machine learning engineers and cluster administrators to understand model performance and to optimize infrastructure utilization. Without visibility […]
Best practices to optimize costs after mergers and acquisitions with AWS Organizations
Mergers and acquisitions (M&As) offer organizations the opportunity to scale operations, diversify product lines, and capture new markets. However, they come with a set of challenges, such as the nuances of integrating legacy IT systems, complying with stringent regulations, and maintaining business continuity, etc. Eliminating the redundancy of resources and optimizing processes to bring consistency […]
Top Picks for Governance, Risk, and Compliance Sessions at re:Inforce 2024
Join us in Philadelphia, Pennsylvania on June 10-12, 2024 for AWS re:Inforce, a cloud governance, compliance, and security conference. Attendees can expand their cloud security knowledge through hundreds of technical and non-technical sessions, engage with AWS experts and certified partners in the expo hall, and hear from AWS security leaders during keynotes. Whether you are […]
Using the unified CloudWatch Agent to send traces to AWS X-Ray
Today, applications are more distributed than ever before and they no longer run in isolation. This is especially the case when utilizing Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS). A distributed workload or system is one that encompasses multiple small independent components, all working together to complete a task or job. […]
Unlock Faster Releases with AWS AppConfig: The Secret Weapon for Your CI/CD Strategy
Striking a Balance Between Reliability and Agility in Cloud Operations The IT operation team of an enterprise serves as the first line of defense against potential business disruptions. They operate 24/7, acts as a hub, continuously monitor and manage the IT environment. The operation team handles and prioritizes critical IT incidents to minimize downtime and […]
How SMBs can deploy a multi-account environment quickly using AWS Organizations and AWS CloudFormation StackSets
Small and Medium Businesses (SMBs) need to operate with high availability and mitigate security risks while keeping costs low. An AWS multi-account environment with workload isolation, robust access control, cost visualization, and integrated security mechanisms can help SMBs build a platform to support growth. SMBs want to deploy a multi-account environment on AWS quickly and […]
Automating Alerts for AWS Global Network Performance
Have your applications hosted on AWS ever experienced inter-Region or inter-Availability Zone (AZ) latency and you wanted to be proactively notified on these latency changes? This blog post describes an automated mechanism to set up those alarms. AWS has introduced the ability to understand the performance of the AWS Global Network by introducing Infrastructure Performance, […]
Optimize AWS Resource Management with Tag Inventory Reports leveraging AWS Resource Explorer
Customers are increasingly seeking an efficient solution to manage their expanding AWS resources, spanning AWS accounts and Regions, amidst changes like mergers, acquisitions, and cloud migrations. AWS Tags offer an effective solution for organizing, identifying, and filtering resources by categorizing them based on criteria such as purpose, owner, or environment. AWS customers would like to […]
Easily set up Amazon CloudWatch Internet Monitor
Amazon CloudWatch Internet Monitor provides near-continuous internet measurements for your internet traffic, including availability and performance metrics, tailored to your specific workload footprint on AWS. With Internet Monitor, you can get insights into average internet performance metrics over time, as well as get alerts for issues (health events). You’re notified about events that impact your […]
AWS Health Events Intelligence Dashboards & Insights
Organizations operating mission-critical workloads on AWS, need the ability to analyze and respond to AWS service events in a timely manner to maintain operational excellence. AWS Health sends AWS Health events on behalf of other AWS services with three main categories: notifications on account administration and security, operational issues that affect AWS services, and scheduled […]