AWS Cloud Operations & Migrations Blog

Category: Management & Governance

Visualizing Amazon CloudWatch Costs – Part 2 – Where does the data come from?

In part 1 of this series we explored an Amazon CloudWatch dashboard which provides a real-time view of some of the typical main contributors to CloudWatch costs. In this second post, we’ll look at how the CloudWatch dashboard widgets were created so that you can learn how to create something similar, or modify the widgets […]

Visualizing Amazon CloudWatch Costs – Part 1

Amazon CloudWatch monitors your AWS resources and the applications you run on AWS in real-time. You can use CloudWatch to collect metrics, logs, traces, set up alarms, create synthetic checks, and more. The information you collect lets you observe, validate, and alert on areas of interest to you. In this two-part post, we’ll explore a […]

Avoid patching failures due to low disk space with AWS Systems Manager Automation and CloudWatch alarms.

Every organization has to comply with keeping their fleet updated on patching and ensure that business and workloads are not affected due to patching. One of the challenges for the operations teams is to patch at scale without affecting production software. The most common reasons workloads patching fails are insufficient disk space, a spike in […]

Enable cross-account queries on AWS CloudTrail lake using delegated administration from AWS Organizations

We are excited to announce a new CloudTrail feature, which lets the management account of an organization configure up to 3 delegated administrators to manage the organization’s trails and Lake event data stores. A delegated administrator has permission to manage resources on behalf of the organization. Delegated administrator support enables flexibility for customers by allowing […]

The Importance of Key Performance Indicators (KPIs) for Large-Scale Cloud Migrations

Key performance indicators (KPIs) are quantifiable measurements that help you understand how well you’re performing in specific areas. For example, from an incident management perspective, you may measure the mean time to recovery to understand how long it takes to recover following an incident. Large-scale enterprise migration programs (such as vacating a data center or […]

Know Before You Go – AWS re:Invent 2022 Centralized Operations Management

Whatever stage you are at in your process of moving to or operating in the cloud, AWS offers centralized operations management services that you can use to manage and operate your applications on AWS, on-premises, in hybrid environments, or at the edge. Operate your applications from a central location with automation, integrations, built-in best practices, […]

Know Before You Go – AWS re:Invent 2022 Compliance & Auditing

As organizations scale by moving more of their workloads to the cloud, they are looking to manage their cloud operations securely and to be prepared for compliance and auditing. AWS Cloud Operations aims to improve the compliance and auditing process in the cloud through best-in-class services by the scale and security of AWS infrastructure, per […]

Picture of cube with title of blog

Know Before You Go – AWS re:Invent 2022 Monitoring & Observability

Whether you are building out applications in the cloud, modernizing your environment, or migrating workloads, observability is vital to your success. Monitoring and observability provide operational visibility and insight into your workloads and are crucial to operational excellence. AWS Observability will be at re:Invent 2022 to share how you can leverage observability for your organization. […]

Automate AIOps for your microservices in AWS using Amazon DevOps Guru and AWS Systems Manager Incident Manager

Artificial intelligence operations (AIOps) is the process of using machine learning techniques to solve operational problems. The goal of AIOps is to reduce human intervention in IT operations processes. By using advanced machine learning techniques, you can reduce operational incidents and increase service quality, and AIOps can help you predict incidents before they happen. Amazon […]

How to develop an Observability strategy – Part 2

Your observability strategy starts with your business. “Observability” describes how well you can understand what’s happening in a system. Developing an observability strategy isn’t a one-time effort. It’s a continuous improvement effort that occurs throughout the lifecycle of your workloads. It enables your teams to determine whether or not the workloads they design and run […]