Production in Mind: Preparing for Cloud Operations

by Vieng Soukhavong, Global Sentinel Ops Leader, AWS Managed Cloud

Introduction by Mark Schwartz, Enterprise Strategist

This is the second post in our series from Amazon Managed Services (AMS), presenting what they have learned about operating effectively in the cloud. An important principle in the cloud and DevOps world is to design for operations. Code should not only deliver the necessary capabilities (securely, scalable, etc.) but should also be designed for how it will be operated once deployed. Vieng Soukhavong talks about some of the things teams should keep in mind in preparing for operations.

– Mark

Production in mind: Preparing for cloud operations

by Vieng Soukhavong, Global Sentinel Ops Leader, AWS Managed Cloud

Considering the benefits of the cloud, many organizations want to accelerate migration. In a previous post, we discussed the advantages of minimum viable refactoring—getting the balance right between making changes to your application and speeding your path to the cloud.

It’s also important to consider how you’ll operate once you get there. Preparing ahead of time can reduce cost and risk while maximizing the benefits of cloud adoption. Some tasks, such as maintaining server hardware, go away altogether for migrated applications. Others, such as managing and securing ports, are still relevant, but shift somewhat in nature. And, the cloud has more than its share of unique possibilities, such as auto scaling and operations as code.

At AWS Managed Services (AMS), we operate in the cloud every day. Here are a few tips from our experience to help you migrate with production in mind.

Maximize the power of disposability

Although there are probably a few Ops professionals out there who love babysitting servers, they’re likely the exception. Most of us would rather be focused on optimizing platforms and applications and solving interesting problems. It’s common to feel a sense of relief after getting to production in the cloud: no more procurement headaches, no more air conditioning units going on the fritz, no more cables and fans and boxes of junk.

What’s more, the resilient nature of the cloud makes it simple to spin instances up and down. If you detect a vulnerability, you can just terminate and redeploy, typically in seconds, with little or no downtime. Need temporary capacity? Switch to a larger instance type until the spike is over.

Using these capabilities to their fullest requires a mindset shift. If you’re accustomed to keeping servers running at all costs, getting used to disposability can take a bit of time. It also takes DevOps evolution. To deploy quickly, you’ll want to have a solid continuous deployment pipeline.

Standardize to optimize

Because the foundation of on-premises infrastructure consists of individual servers, many organizations run into the “special snowflake” problem. Servers might stay running for years. Overtime, customizations, fixes, and just different ways of doing things lead to increasingly divergent hardware profiles. As time goes on, it becomes difficult to manage these differences, which can make troubleshooting very challenging.

In the cloud, fresh instances can be deployed on a regular basis using a standardized configuration. When the configuration is updated, all the servers it applies to are brought up to a current state. Consistency should be the norm rather than the exception. This means understanding which instance types are optimal for given services and how they should be configured.

Ensuring best practices

To maximize operational efficiency in the cloud, think beyond infrastructure and consider standardizing operations on a larger scale. For example, you can ensure that logging and monitoring are enabled by default for all instances. This will provide you with consistent feedback so you can look for ways to improve your architecture and how you operate it. Your runbook goes from being a static documentation of individual server quirks to becoming a living document that enables ever-greater efficiency.

By enforcing standardized tagging, you can assign costs to departments, specific applications, or predefined cost centers. This enables operations to become a partner to the business. You are no longer just keeping the service up and running, you’re enabling the business to become more effective, partnering with them to get to desired business outcomes.

It can also be helpful to have standard categories for applications and how they are managed. For example, critical applications may alert you when they breach a 50 percent CPU utilization threshold. For a dev and test instance, 90 percent utilization may be fine. Communicating these standards to the rest of the business ensures a robust dialogue about the tradeoffs between availability, utilization, and cost.

Operations as code

The old way of documenting operations is to literally write everything down in a document or spreadsheet. If a new person was trying to operate using your runbook, they would often have to interpret what you meant. Or, in some cases, they would just avoid making changes all together.

Today, there’s operations as code. Documentation and execution become one. Deploying instances, configuring policies and networking, and attaching cloud services can all be codified and automated. At AMS, AWS CloudFormation is one of our favorite tools for doing that. AWS CloudFormation provides a common language for you to describe, model, and automatically provision all the infrastructure resources in your cloud environment using a simple text file. This file serves as the single source of truth for your cloud environment.

Discover the future of your cloud ops

AMS can help you quickly and efficiently make smart changes to your operational approach before you migrate to the cloud. Infrastructure as code is our standard operating procedure, so you benefit from dramatic improvements in visibility and automation. Our goal is always to empower your team to operate more effectively in the AWS environment, which means we share our methodologies and best practices freely. Or we can run it for you in a fully optimized way. We help you take the shortest path to optimal cloud operations.

Learn more at https://aws.amazon.com/managed-services/.

Vieng Soukhavong, Global Sentinel Ops Leader, AWS Managed Cloud

Vieng Soukhavong is the Global Head of Operations for AWS Managed Services. He has over two decades of successful experience, leading and working in Operational environments. Vieng is passionate about operational excellence and delivering positive experiences for his teams and customers. Vieng’s leadership has been previously recognized by the CSIA (Customer Service Institute of Australia), when he lead his team to win the national award for Service Excellence. Vieng holds a Master degree in Business Administration, Bachelor of Engineering (Hons), and Advanced Diploma in Telecommunications Engineering.