AWS Cloud Operations Blog
Operations transformation to navigate the VMware migration to AWS
IT operations are at the heart of every organization. Organizations leveraging VMware, have built and adapted to an operating model overtime that can seem daunting to migrate to the cloud. Amazon Web Services (AWS) migration impacts changes to your operations tooling, existing responsibility model, and operations processes tailored to their VMware environment. While AWS offers comprehensive migration tools, and guidance, success of the migration journey also heavily hinges on transforming operations to fully leverage AWS capabilities
This blog post explores the critical operational changes needed for a smooth transition to AWS, introducing you to essential resources, mechanisms, and AWS services that will help you:
- Adopt cloud-native operations tools and processes
- Manage your AWS environment at scale
- Operate securely in the cloud
- Develop necessary AWS skills within your team
By focusing on these aspects, you’ll be better equipped to maximize the benefits of your AWS migration, providing a more seamless and successful transition for your organization.
The operational shift: Embracing AWS capabilities
As you migrate your applications to AWS, your existing IT responsibility model changes. The AWS shared responsibility model can help relieve your operational burden, as AWS operates, manages, and controls the virtualization layer down to the physical security of the facilities where the service operates. Your responsibility will be determined by the AWS service you select. These responsibilities include configuring the services you provisioned, monitoring them, patching them if applicable (for example, Amazon Elastic Compute Cloud (Amazon EC2) instances), and such.
As you transition to AWS, you’ll find cloud-native alternatives to your existing VMware specific tools (vCenter, VMware Aria Operations, and such), to aid in your operations. On AWS, you will need purpose-built tools to manage and operate your AWS workloads. Similarly, your existing operations processes need to adapt to utilize the AWS capabilities such as self-service provisioning, pay-as-you-go, auto-scaling, and more. To navigate this shift, an operations transformation is essential.
The operational transformation can be broadly classified in four pillars:
- Operations tools: Services and mechanisms that you leverage for day-to-day operations of your IT environment on AWS.
- Operations Processes: The operations processes and activities involved in the day-to-day operations.
- People functions: Roles and responsibilities of the teams involved in cloud operations.
- Skill building: Building the necessary AWS skills to operate successfully on the cloud.
Operations tools to manage your AWS Environment
Typical IT operations encompass a wide range of daily activities crucial for maintaining the health, security, and performance of your IT environment. These activities include regular system monitoring, patching, security updates, resource allocation, performance optimization, and compliance management—requiring significant time and expertise from IT teams. AWS provides over 25 management and governance services to help you ease, streamline and enhance these regular operations activities.
The following table describes various activities typically involved in cloud operations, with activity descriptions, and the relevant AWS services you could leverage to perform the activities.
![The picture is a table with three columns, the first one contains typical operational activities involved with your AWS resources, the second column describes the activities, and the third column lists down the AWS services available to perform the operations activities.](https://d2908q01vomqb2.cloudfront.net/972a67c48192728a34979d9a35164c1295401b71/2024/12/12/Operations-activities-and-AWS-services-e1733975246259.png)
Figure 1: Typical cloud operational activities and relevant AWS services
At the end of this post will be a listing of all the operational activities listed within the table containing descriptions and links to the AWS services suggested within Figure 1 – Typical operational activities and relevant AWS services.
Operations process transformation to adapt to AWS
Both on premises and AWS environments require robust IT operations, but differ in their approaches. On-premises operations often involve manual processes and approvals, while successful cloud adoption embraces speed, agility, and automation in operations, without compromising checks and balances.
- Flexible and scalable: Operations processes need to be designed to leverage the scalability, dynamism, and automation that AWS offers.
- Automation and Infrastructure as Code: The operations processes need to heavily emphasize on standardization, automation and Infrastructure as Code practices—enabling consistent and repeatable provisioning and management of resources.
- Cloud-native managed services: AWS offers a wide range of managed services, such as Amazon Relational Database Services (Amazon RDS), AWS Lambda, and more to offload operational tasks to AWS.
In this section, we will go through the key IT operations processes, and describe the process transformation involved to leverage AWS capabilities effectively.
Resource provisioning
On AWS, your application teams have the flexibility to utilize native AWS capabilities through self-service, leveraging AWS Service Catalog, AWS CloudFormation, AWS Command Line Interface (AWS CLI) and APIs to provision AWS resources.
To take advantage of the flexibility AWS provides, you should include the following best practices in your provisioning process:
- Establish standards for AWS service provisioning and Infrastructure as Code (IaC) templates
- Build a centralized repository of IaC templates or AWS Service Catalog products
- Define preventive and detective guardrails to enforce security guidelines, policies, and alignment to AWS well-architected principles.
Further, integrating AWS Config in the configuration management process allows tracking changes to the resources in an automated fashion.
- Leverage AWS Config to view resource configuration, resource relationship tracking, configuration history of resources and integrate the information into your existing configuration management database (CMDB).
- Implementing consistent and accurate tagging on all AWS services is critical to identifying and tracking cloud resources.
- Implement resource configuration compliance to best practice standards through AWS Config conformance packs and Config rules.
Tagging
Through tagging, you can assign metadata to different resources in your AWS environment for a variety of purposes. This can include attribute-based access control (ABAC), cloud financial management, tracking resources that belong to an application, and automation (such as backup, patching, and such).
AWS recommends developing a tagging strategy that aligns with your needs and incorporating tagging into your operational processes.
Tag Editor – With Tag Editor, you can add, edit or delete tags of multiple resources at once.
Change management
Implementing production changes is often a tedious process involving multiple teams and approvals. By adopting AWS, you can leverage automation to streamline the change process, and build automated checks to manage the risk.
- Adopting IaC and continuous integration and continuous delivery (CI/CD) practices can automate changes, enabling frequent small updates with built-in testing to reduce risks.
- Adopting pre-defined, tested, repeatable patterns for configuration management improves agility and reduces the risk.
- To further reduce the risk of changes and new application releases, customers should leverage Blue/Green deployments and Canary deployment strategies.
The Well-Architected Change Enablement whitepaper provides you additional guidance on the change management and governance best practices for your AWS environment.
Cloud financial management
Cloud financial management monitors and optimizes cloud costs while balancing service delivery quality and risks. It begins with a governance framework that defines cost management controls, processes, roles, responsibilities, and cost baselines.
- Establish visibility into your AWS costs by mapping the resource costs to workloads and business units. Build chargeback/show back mechanisms to provide cost transparency.
- Define cost KPIs and architect for cost leveraging AWS capabilities such as auto-scaling, Amazon EC2 Spot instances, and more.
- Build mechanisms to monitor the spend and optimize the running costs through scheduling, right-sizing the resources, configuring the storage lifecycle policies, and such.
- Evaluate applications and their resource consumption patterns—procure Savings plans or Reserved Instances as applicable.
- Leverage forecasting to plan and set budget thresholds and alerts.
Incident and Problem management
By leveraging AWS automation, you can transform your approach to incident management—shifting from manual response to proactive, automated resolution, and service restoration.
- Build automated runbooks, and deploy event triggered response and resolution when critical alerts are triggered.
- By integrating AWS Systems Manager OpsCenter in your incident management process, you can view, investigate, and resolve operational work items (OpsItems) related to AWS resources.
- To avoid failures during high demand, you could leverage AWS auto scaling groups with scaling policies defined along with health check replacements.
- It is also highly recommended to include regular incident response game days as a part of your ongoing operations resilience support.
Building cloud focused people functions
AWS provides prescriptive guidance on building a cloud operating model that accelerates adoption and delivers higher transformational value. On AWS, application teams can design and build their own applications and infrastructure through self-service, leveraging existing patterns, without depending on other teams.
To empower application teams with self-service, organizations build an automated governance framework. This enforces policies, security best practices, well-architected principles, and other requirements, verifying provisioned resources adhere to governance without compromising agility.
To build and manage the automated governance framework, customers build a Cloud Center of Excellence with cross-functional multi-skilled people teams—Cloud Leadership Team, Cloud Business Office, and Cloud Engineering and operations.
Cloud Leadership Team
The Cloud Leadership Team drives the vision of your cloud service consumption, lays the ground rules, governs the outcomes and re-aligns the strategies as required to maximize the business benefits.
Cloud Business Office
The Cloud Business Office focuses on continuous evolution, and optimal utilization of your cloud environment to meet the application teams’ requirements and the strategic vision.
Cloud Engineering and Operations
This team builds and manages the guardrails and re-usable artifacts for the application teams. They also streamline the processes involved with cloud consumption and act as a steward for the culture of continuous optimization through automation.
Building AWS skills within your organization
Migrating to AWS is a significant organizational transformation that requires upskilling your IT workforce with the necessary AWS skills. To help your teams bridge the skill gap in building and managing applications on the AWS Cloud, AWS offers a comprehensive set of training and certification programs.
By investing in AWS training and certification programs, you can unlock the full potential of hybrid cloud environments, streamline migration processes, and future-proof your IT infrastructure.
Leveraging AWS Managed Services as a migration accelerator
AWS Managed Services (AMS) plays a crucial role in assisting organizations with their transition from VMware workloads to AWS.
Through people, process, and technology capabilities, AMS provides operational expertise, automation, and management capabilities for your workloads as you migrate them to AWS. This enables your team to focus on the application migrations, avoid stalls, while AMS provides the ongoing operations support for the resources migrated:
- Skills and resources: By augmenting your teams, AMS engineers bring the required skills to support you on various aspects of cloud operations—Monitoring, Incident Management, Patching, Backup management, and more.
- 24×7 Support: AMS provides 24×7 support in a ‘follow the sun’ support model leveraging a global team of security and operations engineers.
- Security Management: AMS configures National Institute of Standards and Technology (NIST) and Center of Internet Security (CIS) aligned guardrails, and works with you on security posture improvement. Through Amazon GuardDuty and Amazon Macie monitoring, AMS provides 24×7 security monitoring and incident response.
- Ad-hoc operations activities: Through AMS, you will have the ability to leverage AWS skills on demand for additional ad-hoc catalog of operations activities when needed.
Operational activities and the AWS Services that can help
Monitoring and Observability
To enable customers monitor AWS services, resources, and applications, AWS provides AWS observability tools:
Amazon CloudWatch – Using CloudWatch, you can monitor, collect and track metrics related to the availability and performance of your resources and applications. CloudWatch, further allows you to create alarms and send notifications or automatically take actions when alarms trigger.
AWS X-Ray – With AWS X-Ray you can analyze and debug your distributed applications. AWS X-Ray provides a complete view of requests as they travel through your application and filters visual data across payloads, functions, traces, services, APIs, and more with no-code and low-code motions.
Amazon Managed Service for Prometheus – For customers that are running containers, AWS provides Amazon Managed Service for Prometheus, which is a serverless, Prometheus-compatible monitoring service for container metrics.
Amazon Managed Grafana is a fully managed and secure data visualization service that you can use to instantly query, correlate, and visualize operational metrics, logs, and traces from multiple sources.
Further, you can use AWS Health to gain ongoing visibility into your resource performance and the availability of your AWS services. AWS Health events provide relevant and timely information about service and resource changes that may impact your applications.
Incident management
While customers can continue to use the existing service management tools for case management, AWS has services that can be leveraged to proactively initiate automated incident response and remediation.
AWS Systems Manager Incident Manager – With Incident Manager, a capability of AWS Systems Manager, you can prepare response plans that include escalation plans, and automated runbooks when specific alarms or events are triggered.
AWS Incident Detection and Response – Eligible AWS Enterprise Support customers can utilize AWS Incident Detection and Response to provide 24/7 proactive incident engagement to reduce the potential for failure. It accelerates recovery of critical workloads from disruption with a five-minute response service level agreement (SLA).
Service catalog
Organizations streamline the provisioning and configuration of IT resources, and enable self-service capabilities to improve operational efficiency through service catalogs.
AWS Service Catalog enables you to create and manage catalogs of IT services that are approved for provisioning within your AWS environment. These IT services can include Amazon Machine Images, Amazon EC2 instances, applications, databases, or complete multi-tier application architectures.
Resource management
To simplify resource management, AWS provides you with the following services:
AWS Systems Manager is the operations hub for your AWS applications and resources. It is a secure end-to-end management solution for hybrid and multi-cloud environments that enables secure operations at scale. Using AWS Systems Manager, you can manage and automate operations tasks on Amazon EC2 instances, like patch management, configuration management, and remote command execution.
AWS Resource Explorer is a resource search and discovery service. Resource Explorer uses a variety of data sources and provides fast responses to your search queries by using indexes that are created and maintained by the Resource Explorer service.
Logging
Logging is a crucial aspect of IT operations and security monitoring providing visibility into system events, performance metrics, and application logs. AWS offers a comprehensive set of logging services that span across various AWS services and resources, and provides you the capabilities to integrate with your existing log management tools.
Amazon CloudWatch Logs enables you to monitor, store, access, and query your log files from various AWS resources like Amazon EC2 instances, AWS CloudTrail, and more. You can centralize the logs from all the systems and AWS services, in a single, highly scalable service.
AWS CloudTrail – Visibility into AWS account activity is a key aspect of security and operations best practices. Actions taken by a user, role, or an AWS service are recorded as events in CloudTrail. Events include actions taken in the AWS Management Console, AWS Command Line Interface, and AWS SDKs and APIs.
Configuration compliance
Configuration compliance focuses on validating that all the resources are configured as per the best practices, security policies, and compliance mandates.
AWS Config provides a detailed view of the resource configuration in your AWS account. This includes how they are configured, how they are related to one another, and how the configurations and their relationships have changed over time. Through AWS Config rules, you can evaluate the configuration compliance on your resources, and automate the remediation.
Patching
Patching on AWS is a shared control. AWS is responsible for patching and fixing flaws within the infrastructure, but you are responsible for patching guest OS and applications. To enable patching and updating of the operating system on Amazon EC2 instances, AWS provides a centralized patching service.
AWS Systems Manager Patch Manager is a capability of AWS Systems Manager, that automates the process of patching managed nodes with both security-related updates and other types of updates.
For upgrading your managed AWS services like Amazon RDS, you can choose to upgrade from the AWS Management Console, AWS CLI or APIs.
Backup
AWS provides a range of backup services and features designed for various AWS resources, including Amazon EC2 instances, Amazon EBS volumes, Amazon RDS databases, and other supported AWS services.
AWS Backup is a fully-managed service that centralizes and automates data protection across AWS services, in the cloud, and on premises. Further AWS Backup provides the cross-account management feature to manage and monitor your backup, restore, and copy jobs across AWS accounts in your AWS Organizations. Through AWS Backup Audit Manager, you can audit the compliance of your AWS Backup policies against controls that you define.
Cost optimization
AWS offers a pay-as-you-go approach for pricing for most cloud services. Further, a range of native AWS cost optimization services and tools provide visibility into your AWS costs, allow setting budgets, and identify cost-saving opportunities.
AWS Trusted Advisor inspects your AWS environment, and makes recommendations when opportunities exist to save money, improve system availability and performance, or help close security gaps.
AWS Compute Optimizer helps you right size workloads according to your workload preferences through artificial intelligence and machine learning-based analytics to reduce costs by up to 25 percent.
You can also configure Instance Scheduler on AWS to shut down your non-production Amazon EC2, Auto Scaling groups, and Amazon RDS instances during non-business hours to optimize your AWS spend.
Governance
An AWS account acts as an isolated resource containers and resource isolation boundary, that offer workload categorization and blast radius reduction if things go wrong. When you migrate your workloads to AWS, you will leverage multiple AWS accounts to isolate and manage your workloads.
To streamline multiple AWS account management, and to govern your AWS environment at scale, AWS provides AWS Organizations and AWS Control Tower.
AWS Organizations helps you centrally manage and govern your environment as you grow and scale your AWS resources. Using Organizations, you can create accounts and allocate resources, group accounts to organize your workflows, apply policies for governance, and simplify billing by using a single payment method for all of your accounts.
AWS Control Tower offers a straightforward way to set up and govern an AWS multi-account environment, following prescriptive best practices. AWS Control Tower orchestration extends the capabilities of AWS Organizations. AWS Control Tower applies preventive and detective controls (guardrails) to help keep your organizations and accounts from divergence from best practices (drift).
Conclusion
In this blog post, we have covered various aspects of operations transformation to empower VMware administrators with a better understanding. There is a need for a shift in tools, processes, and skills required to efficiently operate the AWS migrated workloads from on-premises VMware environments.
Lastly, AWS Managed Services can be a valuable partner, providing operational expertise, automation, and support to navigate cloud operations, skills, and resource requirements. By leveraging AMS, organizations can focus on core business objectives while offloading infrastructure management to AWS.
Contact an AWS Representative to know how we can help accelerate your business.
Additional resources
- Designing a Cloud Center of Excellence
- Operational excellence on AWS
- Maximize Cloud Adoption benefits with a Well-Architected Organizational Culture
About the authors: