AWS Cloud Operations & Migrations Blog

Category: Learning Levels

Gaining more control over Multi-Regional AWS CloudFormation deployments

Routinely deploying resources to multiple regions is increasingly normal for situations like Disaster Recovery (DR), regulatory and compliance, and end-user latency requirements. Keeping multiple environments in sync is challenging and drives Infrastructure as Code (IaC) adoption through services like AWS CloudFormation. This post demonstrates a generic design pattern for orchestrating multi-Regional deployments when you need […]

Integrate Amazon CloudWatch metrics with ServiceNow using Amazon Managed Grafana

ServiceNow ITSM is a cloud-based platform designed to improve IT services, increase user satisfaction, and boost IT flexibility and agility. With ServiceNow IT Service Management, you can consolidate your legacy on-premise systems and IT tools into our single data model to transform the service experience, automate workflows, gain real-time visibility, and improve IT productivity. Amazon […]

Manage AWS resources in your Slack channels with AWS Chatbot

**This post was written while the feature to manage AWS resources in Slack channels was in public preview. This feature is now generally available. The information contained within this post is still relevant and helpful.** DevOps and engineering teams are increasingly moving their operations, system management, and CI/CD workflows to chat applications to streamline activities […]

Migrate On-Premises Multi-Tenant Systems to Amazon Elastic Kubernetes Service

Managing the deployment of containers in a multi-tenant environment presents a number of new challenges for many of my customers. Some organizations have explored building and managing their own Kubernetes container orchestration environment, but the management challenges lead them to evaluate Amazon Elastic Kubernetes Service (Amazon EKS). Particularly, Independent Software Vendors (ISVs) are using a […]

Monitoring Amazon EMR on EKS with Amazon Managed Prometheus and Amazon Managed Grafana

Apache Spark is an open-source lightning-fast cluster computing framework built for distributed data processing. With the combination of Cloud, Spark delivers high performance for both batch and real-time data processing at a petabyte scale. Spark on Kubernetes is supported from Spark 2.3 onwards, and it gained a lot of traction among enterprises for high performance and […]

Customize Well-Architected Reviews using Custom Lenses and the AWS Well-Architected Tool

The AWS Well-Architected Tool (AWS WA Tool) lets you learn best practices for architecting workloads on the cloud, measure workloads against these best practices, and improve the workload by implementing best practices. These best practices have been curated under the AWS Well-Architected Framework (AWS WA Framework) and Lenses based on our tens of thousands of […]

Codify your best practices using service control policies: Part 2

I introduced the fundamental concepts of service control policies (SCPs) in the previous post. We discussed what SCPs are, why you should create SCPs, the two approaches you can use to implement SCPs, and how to iterate and improve SCPs as your workload and business needs change. In this post, I will discuss how you […]

Codify your best practices using service control policies: Part 1

Each AWS account enables cellular design – it provides a natural isolation of AWS resources, security, partitions access, and establishes billing boundaries. Separation of concern through multi-account setup is a key design principle that customers use to experiment, innovate, and scale quickly on AWS. The basis of a multi-account AWS environment is AWS Organizations, which […]

Automate enrollment of accounts with existing AWS Config resources into AWS Control Tower

Customers who deployed AWS Control Tower in their existing organization will begin enrolling existing member accounts located under Organization Units (OU) to bring those accounts under the governance of Control Tower. In most cases, the customer has already enabled AWS Config to record, and evaluate AWS resource configurations in existing accounts. Previously, customers who would want […]

Why you should develop a correction of error (COE)

Application reliability is critical. Service interruptions result in a negative customer experience, thereby reducing customer trust and business value. One best practice that we have learned at Amazon, is to have a standard mechanism for post-incident analysis. This lets us analyze a system after an incident in order to avoid reoccurrences in the future. These […]