AWS Cloud Operations Blog

The Importance of Key Performance Indicators (KPIs) for Large-Scale Cloud Migrations

Key performance indicators (KPIs) are quantifiable measurements that help you understand how well you’re performing in specific areas. For example, from an incident management perspective, you may measure the mean time to recovery to understand how long it takes to recover following an incident.

Large-scale enterprise migration programs (such as vacating a data center or migrating over 500 servers) typically last several months, if not years. This duration makes it important for KPIs to be defined during the early stages of the migration to monitor the migration efforts and highlight risks and problems.

This post is aimed at those who are leading a large-scale enterprise cloud migration or are in the early stages of planning a migration. I’ll discuss the importance of KPIs and how to determine and measure KPIs for your migration program.

Why are KPIs important for monitoring large scale migrations?

There are four primary reasons why KPIs are valuable in large-scale AWS migrations:

  • To remain focused on the target business outcomes
  • To centralize visibility of federated decisions
  • To make data-driven decisions
  • To understand program health

Remain focused on the target business outcomes

In a large-scale cloud migration, a wide range of teams will be involved at various stages, for example, Security, Infrastructure, Operations, Business, Finance, and Application teams. Each team may have different motives for cloud migration, but it’s important to focus on the target business outcomes. Outlining and socializing KPIs across the entire organization helps ensure that all teams are aligned on the overarching goals and that their contributions are measured. Taking this further, you could convert the KPIs into team objectives to encourage the desired behaviours.

For example, if your overarching goal is to reduce your total cost of ownership (TCO), create KPIs which correlate predicted application spend alongside actual spend. For the predicted spend, you could leverage data from Migration Evaluator, or your business case. For the actual spend, you could leverage data from the AWS Cost and Usage Reports. Socializing dashboards showing this KPI will emphasise the importance of cost reduction with teams involved in the migration.

Centralize visibility of federated decisions

Large scale migrations require a wide range of decisions to be made. Many of these decisions will be made by the application teams, not the centralized program, as they understand their application requirements in greater detail. However, these decisions can significantly impact the overall migration program and it’s benefits. Building upon the TCO example discussed above, application teams may select target Amazon Elastic Compute Cloud (Amazon EC2) instance types (e.g., m5.large) or AWS Lambda function memory allocations (e.g., 6144 MB) for their applications. If the instance types or memory allocated to Lambda functions are significantly higher than required, it will lead to higher costs on AWS. Defining and measuring KPIs helps make sure that the program remains on track to achieve the target business outcomes. Furthermore, it helps identify when the program is trending negatively against a KPI so that remediating actions can be applied following a deep-dive review.

Data-driven decisions

Capturing and evaluating critical data points regarding a migration enables you to make data-driven decisions. Even if there’s no data to support you in making a decision, you’ll be able to pilot a new approach and monitor its impact on your KPIs. If the new approach has a negative impact on your KPIs, you can quickly adapt your approach based on the latest data obtained. If you weren’t actively monitoring the KPIs, then it would take longer to notice the negative impact that the decision had.

For example, your organization may need to exit a data center by a specific date to avoid large costs associated with renewing contractual agreements with a data center operator. By tracking migration velocity as a KPI (i.e., the number of servers or applications migrated per month), you’ll be able to forecast if you’re on schedule to migrate the remainder of the estate by the deadline. Centralized migration tracking will be required, including capturing the dates assets were migrated. This will help you answer key questions, such as ‘how many servers are we migrating per month?’, and ‘are we on track to complete the migration within the required timeframe?’. If you realize you’re behind schedule, you can take corrective action and monitor its impact on migration velocity.

Understand program health

Once KPIs are defined for the migration, tracking and reporting program health can also become data-driven. For example, suppose you migrate 10,000 servers within 20 months, In that case you’ll be able to use the data about your current migration velocity (number of servers migrated per month) to estimate if you’re likely to achieve the required goal. In the example above, the project must migrate a minimum of 500 servers per-month to achieve the required goal. Other examples of program health metrics include:

  1. Resource effort required to migrate an application via a given approach, measured through project tracking tools (such as employee time sheets). This enables you to balance the effort required across various teams.
  2. Average time spent for each phase of the migration, measured through workflow tooling. This helps you forecast which phases take the most time so that you can deep-dive into accelerating the process. For example, it might take a significant amount of time for the security review process due to a resourcing bottleneck.

Determining and measuring KPIs

There are typically two types of KPIs related to a migration program:

  1. KPIs aligned toward business value expected from delivering the migration: will be long-lived and extend beyond the migration program (e.g., reduction in total cost of ownership of cloud resources).
  2. KPIs aligned toward the project: once the project is complete, the requirement to measure will no longer exist (e.g. number of servers migrated per month).

Both of the above types of KPIs are valuable to help you validate that the program delivers the expected value, and make sure that the migration completes within the program constraints. It would help if you worked backward from your target outcomes when determining the KPIs to make sure that you’re measuring the correct variables. For example, if you’re migrating to AWS to reduce your total cost of ownership, then make sure that you’re measuring costs as part of your KPIs.

Business level KPIs

Working backward from your target business outcomes is essential to understand your business-level KPIs. There might be more than one target business outcome for your migration, so you’ll likely need multiple KPIs. To align your organization’s leaders on the goals, it’s recommended to socialize the KPIs.

For example, if you’re migrating to AWS to increase your operational stability, you’ll want to start by determining specific KPIs that measure operational capability. This could include comparing service availability with the agreed service level agreements (SLAs), the number of unplanned outages per quarter, and mean time to resolve issues, to name a few. Once this is understood, you can define how it will be measured in AWS so that comparisons can be made. For more information on this, I recommend reading our Understanding Operational Health documentation

Program level KPIs

To measure program-level KPIs, you must understand which constraints your program operates under. A common example is that consolidating or exiting data centers must occur before a specific date. Therefore, you must migrate all of the assets prior to this date. Measuring the achieved migration velocity against the required velocity provides an indicator of the health of your program.

Measuring KPIs

An example migration health dashboard displaying graphs of the key metrics for a large-scale migration in Amazon Quicksight.

Figure 1. An example migration health dashboard displaying graphs of the key metrics for a large-scale migration in Amazon Quicksight

It’s important to understand how your KPIs be measured once they have been determined. The goal here is to strive for a fully-automated mechanism that can capture the data required with no human interaction. In addition, using automated data sources and creating dashboards increases the integrity of the data source, as it’s not dependent on humans remembering to update the data. For example, you may ingest and visualize the AWS Cost and Usage Report into Amazon Quicksight to build automated dashboards reporting cost-based KPIs. Alternatively, if you’re using the Cloud Migration Factory on AWS Solution, you could use the migration data captured to present your current migration velocity.

Finally, consider if these KPIs should become team objectives to help drive the desired actions throughout the business. For example, you may set a goal for each application team to migrate five applications per month.

Recommended KPIs for large-scale cloud migration

  1. Total number of servers/applications migrated compared with the number of servers/applications still within the scope of the migration
  2. Number of servers/applications migrated within a given timeframe (e.g., migration velocity)
  3. Number of resource hours required to migrate a workload via each migration pattern
  4. Number of workloads where the cutover is delayed
  5. Number of workloads where the migration was rolled back
  6. The total cost of ownership before and after a workload has been migrated
  7. The migration strategy disposition (e.g., 7 R’s)
  8. The number of application releases made per quarter before and after being migrated

Conclusion

Throughout this post, I discussed why KPIs are important, how to determine the KPIs for your migration, how to measure the KPIs, and outlined KPIs which are recommended for large-scale cloud migrations.

I recommend dedicating time towards determining the KPIs for your migration as it helps ensure that you measure what matters. Furthermore, once set and socialized, these KPIs will influence the behaviour of team members supporting the migration.

Explore ways to automatically collect and present the data with minimal or no human involvement. This will reduce the overall level of effort and remove the potential for human error.

About the authors:

Damien Renner

Damien is a Senior Consultant in the Migration, Modernization and Management Global Specialty Practice at AWS. He works with enterprises to understand their business vision and transforms them into technical solutions.