AWS Cloud Operations & Migrations Blog

Category: AWS Systems Manager

automated operations cloud operating model

Reinventing automated operations (Part II)

The first post in this series, Reinventing automated operations (Part I), covered the importance of operations in the cloud and how deferring the creation of an operations plan can slow down your migration. In this post, I’ll share the primary mechanism of iterative improvement (aka flywheel) that AWS Managed Services (AMS) uses to increase operational […]

Detecting and remediating process issues on EC2 instances using Amazon CloudWatch and AWS Systems Manager

Detecting and remediating process issues on EC2 instances using Amazon CloudWatch and AWS Systems Manager

Customers want to have visibility into processes running inside their Amazon Elastic Compute Cloud (Amazon EC2) instances. Critical processes and services in these instances can crash unexpectedly and when they do, it’s crucial for customers to be notified so they can maintain continued business operations. There are multiple ways to see if a service is […]

Creating contacts, escalation plans, and response plans in AWS Systems Manager Incident Manager

Creating contacts, escalation plans, and response plans in AWS Systems Manager Incident Manager

Many of our customers need an effective incident management and response solution to achieve operational excellence and performance efficiency. Transparency between those who are affected by the incident and those who respond to the incident is key to any incident management process. Finding the right team to mitigate the impact of application or workload incidents […]

AWS Systems Manager Incident Manager integration with Amazon CloudWatch Part 2

AWS Systems Manager Incident Manager integration with Amazon CloudWatch

This is the second post in a two-part series about AWS Systems Manager Incident Manager. In the first post, we covered onboarding steps like creating contacts, an escalation plan, and a response plan in Incident Manager. In this post, we discuss the integration between Incident Manager and Amazon CloudWatch and how Incident Manager components manage an […]

Automated just-in-time storage for SQL Server backup using AWS Systems Manager Automation

Automated just-in-time storage for SQL Server backup using AWS Systems Manager Automation

There are times when you need fairly large storage volumes for use cases that are infrequent but needed recurrently. For example, one AWS customer needed to have multiple terabytes of Amazon Elastic Block Store (Amazon EBS) volumes available for taking MSSQL full backups. The backup job was scheduled as a weekly task but the customer […]

Use the power of script steps in your Systems Manager Automation runbooks

Use the power of script steps in your Systems Manager Automation runbooks

Customers have been using AWS Systems Manager Automation documents for years to define to define a sequence of actions to take on their AWS infrastructure such as invoking an AWS Lambda function or copying an Amazon Machine Image (AMI). These documents, now referred to as runbooks, are simple to use, yet powerful. The aws:executeScript action […]

Diagnose and remediate AWS Security Hub findings with AWS Systems Manager OpsCenter and Explorer

In this post, we will show you how to configure AWS Systems Manager OpsCenter to aggregate security findings from AWS Security Hub into OpsCenter as operational issues. OpsCenter helps operations engineers and IT professionals reduce issue resolution time by providing a central place to view, investigate, and resolve security issues.  AWS Systems Manager Explorer provides […]

Featured Image for Proactive monitoring of application configuration deployment using AWS AppConfig and Amazon CloudWatch

Proactive monitoring of application configuration deployment using AWS AppConfig and Amazon CloudWatch

While deploying critical changes to large-scale applications, unexpected errors can render the application unavailable to end users until the changes are manually rolled back. As a best practice, many Amazon teams use AWS AppConfig to deploy application configuration changes. AWS AppConfig is a capability of AWS Systems Manager that you can use to create, manage, […]

Automate suspension of an AWS CodePipeline release during critical events using AWS Systems Manager Change Calendar and Amazon EventBridge

In this blog post, I show you how to set up public holidays calendars using AWS Systems Manager Change Calendar and suspend your AWS CodePipeline pipelines during the critical holidays in these calendar events. For example, let’s say an application release pipeline in your AWS account builds and deploys a new version of the application […]

automated operations cloud operating model

Reinventing automated operations (Part I)

This is the first in a two-part series that covers lessons learned at AWS Managed Services (AMS) as we help customers and partners achieve operational excellence on AWS. To create a secure and consistent cloud operating model, you need both operational experience and AWS skills. In my conversations with customers, it is common for experienced […]